/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py:550: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 424 PASS: my_features_df and aa_df successfully combined nrows: 424 ncols: 265 count of NULL values before imputation or_mychisq 102 log10_or_mychisq 102 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML No. of numerical features: 43 No. of categorical features: 7 index: 0 ind: 1 Mask count check: True Original Data Counter({1: 114, 0: 71}) Data dim: (185, 50) ------------------------------------------------------------- Successfully split data: UQ [no aa_index but active site included] training actual values: training set imputed values: blind test set Train data size: (185, 50) Test data size: (239, 50) y_train numbers: Counter({1: 114, 0: 71}) y_train ratio: 0.6228070175438597 y_test_numbers: Counter({0: 120, 1: 119}) y_test ratio: 1.0084033613445378 ------------------------------------------------------------- Simple Random OverSampling Counter({0: 114, 1: 114}) (228, 50) Simple Random UnderSampling Counter({0: 71, 1: 71}) (142, 50) Simple Combined Over and UnderSampling Counter({0: 114, 1: 114}) (228, 50) SMOTE_NC OverSampling Counter({0: 114, 1: 114}) (228, 50) ##################################################################### Running ML analysis: UQ [without AA index but with active site annotations] Gene name: pncA Drug name: pyrazinamide Output directory: /home/tanu/git/Data/pyrazinamide/output/ml/uq_v1/ Sanity checks: Total input features: 50 Training data size: (185, 50) Test data size: (239, 50) Target feature numbers (training data): Counter({1: 114, 0: 71}) Target features ratio (training data: 0.6228070175438597 Target feature numbers (test data): Counter({0: 120, 1: 119}) Target features ratio (test data): 1.0084033613445378 ##################################################################### ================================================================ Strucutral features (n): 34 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.01643085 0.01587701 0.01659703 0.01716876 0.0157814 0.01725698 0.0168314 0.01593757 0.01981902 0.01665521] mean value: 0.016835522651672364 key: score_time value: [0.01110053 0.01039171 0.01039815 0.01039696 0.01042223 0.01038051 0.01035452 0.01037598 0.01076961 0.01039314] mean value: 0.010498332977294921 key: test_mcc value: [0.33796318 0.58655573 0.28690229 0.67460105 0.6761234 0.64465837 1. 0.12182898 0.67005939 0.52299758] mean value: 0.5521689989382099 key: train_mcc value: [0.78194719 0.69251873 0.70439866 0.69166175 0.69166175 0.72007099 0.73268764 0.74454326 0.77164805 0.75735135] mean value: 0.7288489368704532 key: test_accuracy value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.68421053 0.78947368 0.68421053 0.84210526 0.84210526 0.83333333 1. 0.61111111 0.83333333 0.77777778] mean value: 0.789766081871345 key: train_accuracy value: [0.89759036 0.85542169 0.86144578 0.85542169 0.85542169 0.86826347 0.8742515 0.88023952 0.89221557 0.88622754] mean value: 0.8726498809609696 key: test_fscore value: [0.75 0.81818182 0.76923077 0.86956522 0.88888889 0.86956522 1. 0.72 0.88 0.83333333] mean value: 0.8398765244417419 key: train_fscore value: [0.9178744 0.88888889 0.89099526 0.88785047 0.88785047 0.89908257 0.90322581 0.90654206 0.91666667 0.91079812] mean value: 0.9009774700333214 key: test_precision value: [0.69230769 0.9 0.71428571 0.90909091 0.8 0.83333333 1. 0.64285714 0.78571429 0.76923077] mean value: 0.8046819846819847 key: train_precision value: [0.91346154 0.84210526 0.86238532 0.84821429 0.84821429 0.85217391 0.85964912 0.87387387 0.87610619 0.88181818] mean value: 0.8658001980381739 key: test_recall value: [0.81818182 0.75 0.83333333 0.83333333 1. 0.90909091 1. 0.81818182 1. 0.90909091] mean value: 0.8871212121212121 key: train_recall value: [0.9223301 0.94117647 0.92156863 0.93137255 0.93137255 0.95145631 0.95145631 0.94174757 0.96116505 0.94174757] mean value: 0.9395393108699791 key: test_roc_auc value: [0.65909091 0.80357143 0.63095238 0.8452381 0.78571429 0.81168831 1. 0.55194805 0.78571429 0.74025974] mean value: 0.7614177489177489 key: train_roc_auc value: [0.88973648 0.82996324 0.84359681 0.83287377 0.83287377 0.84291566 0.85072816 0.86149879 0.87120752 0.86931129] mean value: 0.8524705482921324 key: test_jcc value: [0.6 0.69230769 0.625 0.76923077 0.8 0.76923077 1. 0.5625 0.78571429 0.71428571] mean value: 0.7318269230769231 key: train_jcc value: [0.84821429 0.8 0.8034188 0.79831933 0.79831933 0.81666667 0.82352941 0.82905983 0.84615385 0.8362069 ] mean value: 0.8199888394792046 MCC on Blind test: 0.32 Accuracy on Blind test: 0.64 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.41101408 0.43322134 0.43520188 0.423666 0.40705562 0.42979956 0.44227242 0.42043042 0.44399595 0.42415285] mean value: 0.4270810127258301 key: score_time value: [0.01099348 0.01108956 0.01145554 0.01103401 0.01119256 0.02155566 0.01103187 0.01108336 0.0113132 0.01128364] mean value: 0.012203288078308106 key: test_mcc value: [0.45868247 0.54761905 0.88949918 0.80507649 1. 0.76623377 0.71350607 0.52299758 0.67005939 0.4025974 ] mean value: 0.6776271401537742 key: train_mcc value: [0.93615116 0.87323164 0.8982762 0.88572497 0.91158328 1. 0.87286094 0.89863369 0.94933931 0.98737524] mean value: 0.9213176411447679 key: test_accuracy value: [0.73684211 0.78947368 0.94736842 0.89473684 1. 0.88888889 0.83333333 0.77777778 0.83333333 0.66666667] mean value: 0.8368421052631578 key: train_accuracy value: [0.96987952 0.93975904 0.95180723 0.94578313 0.95783133 1. 0.94011976 0.95209581 0.9760479 0.99401198] mean value: 0.9627335690065651 key: test_fscore value: [0.8 0.83333333 0.96 0.90909091 1. 0.90909091 0.84210526 0.83333333 0.88 0.66666667] mean value: 0.8633620414673047 key: train_fscore value: [0.97607656 0.95238095 0.96153846 0.9569378 0.96650718 1. 0.95192308 0.96190476 0.98076923 0.99516908] mean value: 0.9703207096742565 key: test_precision value: [0.71428571 0.83333333 0.92307692 1. 1. 0.90909091 1. 0.76923077 0.78571429 0.85714286] mean value: 0.8791874791874792 key: train_precision value: [0.96226415 0.92592593 0.94339623 0.93457944 0.94392523 1. 0.94285714 0.94392523 0.97142857 0.99038462] mean value: 0.9558686539496802 key: test_recall value: [0.90909091 0.83333333 1. 0.83333333 1. 0.90909091 0.72727273 0.90909091 1. 0.54545455] mean value: 0.8666666666666667 key: train_recall value: [0.99029126 0.98039216 0.98039216 0.98039216 0.99019608 1. 0.96116505 0.98058252 0.99029126 1. ] mean value: 0.9853702646106987 key: test_roc_auc value: [0.70454545 0.77380952 0.92857143 0.91666667 1. 0.88311688 0.86363636 0.74025974 0.78571429 0.7012987 ] mean value: 0.8297619047619048 key: train_roc_auc value: [0.9633996 0.92769608 0.94332108 0.93550858 0.94822304 1. 0.93370752 0.94341626 0.97170813 0.9921875 ] mean value: 0.9559167791307461 key: test_jcc value: [0.66666667 0.71428571 0.92307692 0.83333333 1. 0.83333333 0.72727273 0.71428571 0.78571429 0.5 ] mean value: 0.7697968697968698 key: train_jcc value: [0.95327103 0.90909091 0.92592593 0.91743119 0.93518519 1. 0.90825688 0.9266055 0.96226415 0.99038462] mean value: 0.9428415392549067 MCC on Blind test: 0.2 Accuracy on Blind test: 0.59 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.00974631 0.00932074 0.0071764 0.0069828 0.00684166 0.0073626 0.00683832 0.00736499 0.00704551 0.00715041] mean value: 0.007582974433898926 key: score_time value: [0.01076746 0.01019335 0.00825691 0.00814319 0.00808716 0.00803375 0.00830436 0.00816035 0.00810671 0.00811577] mean value: 0.008616900444030762 key: test_mcc value: [ 0.5077524 0.26772484 -0.12677314 0.40849122 0.09356015 0.39594419 0.44320263 0.0805823 0.0805823 0.56061191] mean value: 0.2711678789936081 key: train_mcc value: [0.39956942 0.36799004 0.44276724 0.40782666 0.39882278 0.42873208 0.40887563 0.43322852 0.42873208 0.41898177] mean value: 0.41355262056408115 key: test_accuracy value: [0.73684211 0.68421053 0.52631579 0.73684211 0.63157895 0.72222222 0.72222222 0.61111111 0.61111111 0.77777778] mean value: 0.6760233918128655 key: train_accuracy value: [0.72289157 0.69277108 0.74096386 0.72289157 0.71686747 0.73053892 0.7245509 0.73652695 0.73053892 0.73053892] mean value: 0.7249080152947118 key: test_fscore value: [0.81481481 0.78571429 0.66666667 0.81481481 0.75862069 0.8 0.81481481 0.74074074 0.74074074 0.84615385] mean value: 0.7783081414115897 key: train_fscore value: [0.81147541 0.8 0.81702128 0.80991736 0.80816327 0.81632653 0.81147541 0.81666667 0.81632653 0.81327801] mean value: 0.8120650453135811 key: test_precision value: [0.6875 0.6875 0.6 0.73333333 0.64705882 0.71428571 0.6875 0.625 0.625 0.73333333] mean value: 0.6740511204481793 key: train_precision value: [0.70212766 0.66666667 0.72180451 0.7 0.69230769 0.70422535 0.70212766 0.71532847 0.70422535 0.71014493] mean value: 0.7018958288316359 key: test_recall value: [1. 0.91666667 0.75 0.91666667 0.91666667 0.90909091 1. 0.90909091 0.90909091 1. ] mean value: 0.9227272727272727 key: train_recall value: [0.96116505 1. 0.94117647 0.96078431 0.97058824 0.97087379 0.96116505 0.95145631 0.97087379 0.95145631] mean value: 0.9639539310869979 key: test_roc_auc value: [0.6875 0.60119048 0.44642857 0.67261905 0.5297619 0.66883117 0.64285714 0.52597403 0.52597403 0.71428571] mean value: 0.6015422077922078 key: train_roc_auc value: [0.64724919 0.6015625 0.68152574 0.65226716 0.64154412 0.65731189 0.65245752 0.67104066 0.65731189 0.66322816] mean value: 0.6525498822101656 key: test_jcc value: [0.6875 0.64705882 0.5 0.6875 0.61111111 0.66666667 0.6875 0.58823529 0.58823529 0.73333333] mean value: 0.6397140522875817 key: train_jcc value: [0.68275862 0.66666667 0.69064748 0.68055556 0.67808219 0.68965517 0.68275862 0.69014085 0.68965517 0.68531469] mean value: 0.6836235012609437 MCC on Blind test: 0.44 Accuracy on Blind test: 0.69 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00754261 0.00742865 0.00720763 0.00705886 0.00744486 0.00740099 0.00711918 0.00755024 0.00717258 0.00742507] mean value: 0.007335066795349121 key: score_time value: [0.00887275 0.00809169 0.00855422 0.00822783 0.00839949 0.0080905 0.00788832 0.00815272 0.00828528 0.00825739] mean value: 0.008282017707824708 key: test_mcc value: [ 0.21660006 0.32142857 0.23262105 0.28690229 0.28690229 0.43320011 0.16116459 -0.24029619 0.40291148 0.40291148] mean value: 0.2504345746462975 key: train_mcc value: [0.34619876 0.33098314 0.29538063 0.35569507 0.35404664 0.3240165 0.35981593 0.37214605 0.27958995 0.33041139] mean value: 0.3348284059138056 key: test_accuracy value: [0.63157895 0.68421053 0.63157895 0.68421053 0.68421053 0.72222222 0.61111111 0.44444444 0.72222222 0.72222222] mean value: 0.6538011695906433 key: train_accuracy value: [0.69879518 0.69277108 0.6746988 0.70481928 0.69879518 0.68862275 0.70658683 0.71257485 0.67065868 0.68862275] mean value: 0.6936945386335762 key: test_fscore value: [0.72 0.75 0.69565217 0.76923077 0.76923077 0.76190476 0.69565217 0.58333333 0.7826087 0.7826087 ] mean value: 0.7310221372830068 key: train_fscore value: [0.76635514 0.76497696 0.74766355 0.77625571 0.76190476 0.75925926 0.77625571 0.78181818 0.74885845 0.75471698] mean value: 0.7638064697242107 key: test_precision value: [0.64285714 0.75 0.72727273 0.71428571 0.71428571 0.8 0.66666667 0.53846154 0.75 0.75 ] mean value: 0.7053829503829504 key: train_precision value: [0.73873874 0.72173913 0.71428571 0.72649573 0.74074074 0.72566372 0.73275862 0.73504274 0.70689655 0.73394495] mean value: 0.7276306629094831 key: test_recall value: [0.81818182 0.75 0.66666667 0.83333333 0.83333333 0.72727273 0.72727273 0.63636364 0.81818182 0.81818182] mean value: 0.7628787878787879 key: train_recall value: [0.7961165 0.81372549 0.78431373 0.83333333 0.78431373 0.7961165 0.82524272 0.83495146 0.7961165 0.77669903] mean value: 0.8040928992956405 key: test_roc_auc value: [0.59659091 0.66071429 0.61904762 0.63095238 0.63095238 0.72077922 0.57792208 0.38961039 0.69480519 0.69480519] mean value: 0.6216179653679654 key: train_roc_auc value: [0.66789952 0.65686275 0.64215686 0.66666667 0.67340686 0.65587075 0.67043386 0.67528823 0.63243325 0.66178701] mean value: 0.6602805766319473 key: test_jcc value: [0.5625 0.6 0.53333333 0.625 0.625 0.61538462 0.53333333 0.41176471 0.64285714 0.64285714] mean value: 0.5792030273647921 key: train_jcc value: [0.62121212 0.61940299 0.59701493 0.63432836 0.61538462 0.6119403 0.63432836 0.64179104 0.59854015 0.60606061] mean value: 0.6180003458791998 MCC on Blind test: 0.51 Accuracy on Blind test: 0.74 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00691724 0.00921154 0.00725603 0.00683308 0.00641084 0.0067122 0.00755239 0.00700617 0.00747585 0.00672269] mean value: 0.00720980167388916 key: score_time value: [0.04755116 0.03781438 0.01461935 0.01340771 0.01276135 0.01304817 0.01399469 0.01276302 0.01416636 0.01354003] mean value: 0.01936662197113037 key: test_mcc value: [ 0.33796318 0.14085904 0.32142857 -0.33071891 -0.20865621 0.12182898 -0.02548236 0.2987013 0.12182898 0.53246753] mean value: 0.13102201054732363 key: train_mcc value: [0.51724228 0.58603243 0.6140767 0.51866448 0.57255314 0.57404517 0.54744208 0.57404517 0.6296076 0.53388143] mean value: 0.5667590493666902 key: test_accuracy value: [0.68421053 0.63157895 0.68421053 0.47368421 0.47368421 0.61111111 0.5 0.66666667 0.61111111 0.77777778] mean value: 0.6114035087719298 key: train_accuracy value: [0.77710843 0.80722892 0.81927711 0.77710843 0.80120482 0.80239521 0.79041916 0.80239521 0.82634731 0.78443114] mean value: 0.7987915734795469 key: test_fscore value: [0.75 0.74074074 0.75 0.64285714 0.61538462 0.72 0.57142857 0.72727273 0.72 0.81818182] mean value: 0.7055865615865616 key: train_fscore value: [0.83842795 0.85321101 0.86363636 0.83257919 0.84651163 0.85067873 0.84304933 0.85067873 0.86995516 0.83486239] mean value: 0.8483590469525649 key: test_precision value: [0.69230769 0.66666667 0.75 0.5625 0.57142857 0.64285714 0.6 0.72727273 0.64285714 0.81818182] mean value: 0.6674071761571762 key: train_precision value: [0.76190476 0.80172414 0.80508475 0.77310924 0.80530973 0.79661017 0.78333333 0.79661017 0.80833333 0.79130435] mean value: 0.7923323977285066 key: test_recall value: [0.81818182 0.83333333 0.75 0.75 0.66666667 0.81818182 0.54545455 0.72727273 0.81818182 0.81818182] mean value: 0.7545454545454545 key: train_recall value: [0.93203883 0.91176471 0.93137255 0.90196078 0.89215686 0.91262136 0.91262136 0.91262136 0.94174757 0.88349515] mean value: 0.9132400533028746 key: test_roc_auc value: [0.65909091 0.55952381 0.66071429 0.375 0.4047619 0.55194805 0.48701299 0.64935065 0.55194805 0.76623377] mean value: 0.5665584415584416 key: train_roc_auc value: [0.72792418 0.77619485 0.78599877 0.74004289 0.77420343 0.76881068 0.75318568 0.76881068 0.79118629 0.75424757] mean value: 0.7640605028419135 key: test_jcc value: [0.6 0.58823529 0.6 0.47368421 0.44444444 0.5625 0.4 0.57142857 0.5625 0.69230769] mean value: 0.5495100212824671 key: train_jcc value: [0.72180451 0.744 0.76 0.71317829 0.73387097 0.74015748 0.72868217 0.74015748 0.76984127 0.71653543] mean value: 0.7368227607678467 MCC on Blind test: 0.22 Accuracy on Blind test: 0.6 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.00955367 0.00924325 0.00820327 0.00793886 0.00868225 0.00840187 0.00826693 0.00931406 0.00927305 0.00903201] mean value: 0.008790922164916993 key: score_time value: [0.0091145 0.00844264 0.00794554 0.00818419 0.00846839 0.00918674 0.00848913 0.00863361 0.00857615 0.00812507] mean value: 0.008516597747802734 key: test_mcc value: [ 0.34405118 0.14085904 -0.03149704 0.14085904 0.3086067 0.56061191 0.44320263 0.0805823 0.3040345 0.56061191] mean value: 0.2851922165850045 key: train_mcc value: [0.65495721 0.59292706 0.63691667 0.64636933 0.56076174 0.57399753 0.57517958 0.70283753 0.55505316 0.64203075] mean value: 0.6141030557952815 key: test_accuracy value: [0.68421053 0.63157895 0.57894737 0.63157895 0.68421053 0.77777778 0.72222222 0.61111111 0.66666667 0.77777778] mean value: 0.6766081871345029 key: train_accuracy value: [0.8313253 0.79518072 0.8253012 0.8253012 0.78313253 0.79041916 0.79640719 0.85628743 0.78443114 0.82634731] mean value: 0.8114133179424284 key: test_fscore value: [0.76923077 0.74074074 0.71428571 0.74074074 0.8 0.84615385 0.81481481 0.74074074 0.78571429 0.84615385] mean value: 0.7798575498575498 key: train_fscore value: [0.87931034 0.85714286 0.8722467 0.87445887 0.8487395 0.85355649 0.85470085 0.89380531 0.8487395 0.87445887] mean value: 0.8657159288311089 key: test_precision value: [0.66666667 0.66666667 0.625 0.66666667 0.66666667 0.73333333 0.6875 0.625 0.64705882 0.73333333] mean value: 0.6717892156862745 key: train_precision value: [0.79069767 0.75 0.792 0.78294574 0.74264706 0.75 0.76335878 0.82113821 0.74814815 0.7890625 ] mean value: 0.7729998107832459 key: test_recall value: [0.90909091 0.83333333 0.83333333 0.83333333 1. 1. 1. 0.90909091 1. 1. ] mean value: 0.9318181818181819 key: train_recall value: [0.99029126 1. 0.97058824 0.99019608 0.99019608 0.99029126 0.97087379 0.98058252 0.98058252 0.98058252] mean value: 0.9844184275652008 key: test_roc_auc value: [0.64204545 0.55952381 0.48809524 0.55952381 0.57142857 0.71428571 0.64285714 0.52597403 0.57142857 0.71428571] mean value: 0.5989448051948052 key: train_roc_auc value: [0.78085992 0.734375 0.78216912 0.77634804 0.72166054 0.72952063 0.74324939 0.81841626 0.72466626 0.77935376] mean value: 0.759061892354029 key: test_jcc value: [0.625 0.58823529 0.55555556 0.58823529 0.66666667 0.73333333 0.6875 0.58823529 0.64705882 0.73333333] mean value: 0.6413153594771241 key: train_jcc value: [0.78461538 0.75 0.7734375 0.77692308 0.73722628 0.74452555 0.74626866 0.808 0.73722628 0.77692308] mean value: 0.7635145797367737 MCC on Blind test: 0.42 Accuracy on Blind test: 0.67 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.67604995 0.61749172 0.58538461 0.58932328 0.74973798 0.71265984 0.60874438 0.61769533 0.61801505 0.55964708] mean value: 0.6334749221801758 key: score_time value: [0.01328945 0.01198721 0.01105618 0.0122695 0.01297355 0.01261806 0.01269841 0.01275897 0.01224279 0.01214409] mean value: 0.01240382194519043 key: test_mcc value: [0.45868247 0.28690229 0.67460105 0.45361105 0.88949918 0.64465837 0.71350607 0.12182898 0.2548236 0.2987013 ] mean value: 0.4796814363849572 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73684211 0.68421053 0.84210526 0.73684211 0.94736842 0.83333333 0.83333333 0.61111111 0.66666667 0.66666667] mean value: 0.7558479532163742 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8 0.76923077 0.86956522 0.7826087 0.96 0.86956522 0.84210526 0.72 0.76923077 0.72727273] mean value: 0.8109578659326944 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.71428571 0.71428571 0.90909091 0.81818182 0.92307692 0.83333333 1. 0.64285714 0.66666667 0.72727273] mean value: 0.7949050949050949 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.83333333 0.83333333 0.75 1. 0.90909091 0.72727273 0.81818182 0.90909091 0.72727273] mean value: 0.8416666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.70454545 0.63095238 0.8452381 0.73214286 0.92857143 0.81168831 0.86363636 0.55194805 0.5974026 0.64935065] mean value: 0.7315476190476191 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.66666667 0.625 0.76923077 0.64285714 0.92307692 0.76923077 0.72727273 0.5625 0.625 0.57142857] mean value: 0.688226356976357 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.3 Accuracy on Blind test: 0.65 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01134348 0.01067019 0.00884771 0.00852466 0.0083487 0.00794363 0.00782943 0.00811911 0.00781536 0.0099113 ] mean value: 0.008935356140136718 key: score_time value: [0.01315331 0.00911045 0.00869799 0.00861573 0.00856686 0.00791645 0.00785732 0.00792217 0.00788569 0.00925827] mean value: 0.008898425102233886 key: test_mcc value: [0.45361105 0.89559105 1. 0.89559105 0.67460105 0.66254135 0.89188259 0.26856633 0.88640526 0.76623377] mean value: 0.7395023497912928 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73684211 0.94736842 1. 0.94736842 0.84210526 0.83333333 0.94444444 0.66666667 0.94444444 0.88888889] mean value: 0.8751461988304093 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.7826087 0.95652174 1. 0.95652174 0.86956522 0.85714286 0.95238095 0.75 0.95652174 0.90909091] mean value: 0.8990353849049502 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 1. 1. 1. 0.90909091 0.9 1. 0.69230769 0.91666667 0.90909091] mean value: 0.9077156177156177 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.91666667 1. 0.91666667 0.83333333 0.81818182 0.90909091 0.81818182 1. 0.90909091] mean value: 0.8939393939393939 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.72159091 0.95833333 1. 0.95833333 0.8452381 0.83766234 0.95454545 0.62337662 0.92857143 0.88311688] mean value: 0.8710768398268398 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.64285714 0.91666667 1. 0.91666667 0.76923077 0.75 0.90909091 0.6 0.91666667 0.83333333] mean value: 0.8254512154512155 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.04 Accuracy on Blind test: 0.51 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.08982825 0.08871984 0.08893657 0.08521676 0.08517504 0.09118032 0.09140134 0.09060311 0.08537102 0.08313799] mean value: 0.08795702457427979 key: score_time value: [0.01784706 0.01710677 0.01794147 0.01741266 0.0171783 0.01783466 0.01772738 0.01788592 0.01994085 0.01653624] mean value: 0.017741131782531738 key: test_mcc value: [0.33796318 0.65477023 0.65477023 0.54761905 0.88949918 0.76623377 0.88640526 0.26856633 0.67005939 0.77742884] mean value: 0.6453315461934368 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.68421053 0.84210526 0.84210526 0.78947368 0.94736842 0.88888889 0.94444444 0.66666667 0.83333333 0.88888889] mean value: 0.8327485380116959 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.75 0.88 0.88 0.83333333 0.96 0.90909091 0.95652174 0.75 0.88 0.91666667] mean value: 0.8715612648221344 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.69230769 0.84615385 0.84615385 0.83333333 0.92307692 0.90909091 0.91666667 0.69230769 0.78571429 0.84615385] mean value: 0.8290959040959041 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.91666667 0.91666667 0.83333333 1. 0.90909091 1. 0.81818182 1. 1. ] mean value: 0.9212121212121213 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.65909091 0.81547619 0.81547619 0.77380952 0.92857143 0.88311688 0.92857143 0.62337662 0.78571429 0.85714286] mean value: 0.8070346320346321 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.6 0.78571429 0.78571429 0.71428571 0.92307692 0.83333333 0.91666667 0.6 0.78571429 0.84615385] mean value: 0.779065934065934 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.36 Accuracy on Blind test: 0.65 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.0070622 0.00688052 0.00693893 0.00693154 0.00710917 0.00692058 0.00734472 0.0074923 0.00760174 0.00686693] mean value: 0.007114863395690918 key: score_time value: [0.00797033 0.00801826 0.00802183 0.0084672 0.00798321 0.00860476 0.00797367 0.00884962 0.0084374 0.00873232] mean value: 0.008305859565734864 key: test_mcc value: [ 0.4719399 0.20935895 0.32142857 0.01163105 0.0952381 -0.06493506 0.20385888 0.11396058 -0.0805823 0.2548236 ] mean value: 0.1536722257776471 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73684211 0.57894737 0.68421053 0.52631579 0.57894737 0.44444444 0.61111111 0.55555556 0.5 0.66666667] mean value: 0.5883040935672514 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.76190476 0.6 0.75 0.60869565 0.66666667 0.44444444 0.66666667 0.6 0.60869565 0.76923077] mean value: 0.6476304613261135 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 0.75 0.75 0.63636364 0.66666667 0.57142857 0.7 0.66666667 0.58333333 0.66666667] mean value: 0.6791125541125541 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.5 0.75 0.58333333 0.66666667 0.36363636 0.63636364 0.54545455 0.63636364 0.90909091] mean value: 0.6318181818181818 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73863636 0.60714286 0.66071429 0.50595238 0.54761905 0.46753247 0.6038961 0.55844156 0.46103896 0.5974026 ] mean value: 0.5748376623376623 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.61538462 0.42857143 0.6 0.4375 0.5 0.28571429 0.5 0.42857143 0.4375 0.625 ] mean value: 0.4858241758241758 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.15 Accuracy on Blind test: 0.57 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.08540225 1.07859063 1.06257463 1.06920266 1.05264735 1.05303788 1.1160934 1.0755322 1.05426216 1.04404473] mean value: 1.0691387891769408 key: score_time value: [0.09499049 0.09243846 0.09022665 0.09180641 0.08709741 0.08691168 0.08712554 0.08879185 0.0880568 0.08689451] mean value: 0.08943397998809814 key: test_mcc value: [0.45868247 1. 1. 0.77380952 0.88949918 0.76623377 1. 0.56061191 0.88640526 0.64465837] mean value: 0.7979900484560085 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73684211 1. 1. 0.89473684 0.94736842 0.88888889 1. 0.77777778 0.94444444 0.83333333] mean value: 0.9023391812865497 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8 1. 1. 0.91666667 0.96 0.90909091 1. 0.84615385 0.95652174 0.86956522] mean value: 0.9257998378433161 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.71428571 1. 1. 0.91666667 0.92307692 0.90909091 1. 0.73333333 0.91666667 0.83333333] mean value: 0.8946453546453547 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 1. 1. 0.91666667 1. 0.90909091 1. 1. 1. 0.90909091] mean value: 0.9643939393939394 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.70454545 1. 1. 0.88690476 0.92857143 0.88311688 1. 0.71428571 0.92857143 0.81168831] mean value: 0.8857683982683983 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.66666667 1. 1. 0.84615385 0.92307692 0.83333333 1. 0.73333333 0.91666667 0.76923077] mean value: 0.8688461538461538 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.17 Accuracy on Blind test: 0.56 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [1.74963999 0.88859892 0.84024143 0.96218228 0.93069863 0.91661644 0.85723209 0.860641 0.84781289 0.82444263] mean value: 0.9678106307983398 key: score_time value: [0.22211456 0.18948603 0.20391059 0.21200871 0.21816325 0.22368956 0.13365841 0.19525099 0.19360924 0.23414063] mean value: 0.20260319709777833 key: test_mcc value: [0.60553007 0.89559105 0.88949918 0.77380952 0.88949918 0.76623377 1. 0.39594419 0.88640526 0.77742884] mean value: 0.7879941063317681 key: train_mcc value: [0.89849587 0.86235326 0.8501742 0.86235326 0.87457979 0.86499607 0.86279135 0.89953068 0.87498674 0.8872319 ] mean value: 0.8737493106163656 key: test_accuracy value: [0.78947368 0.94736842 0.94736842 0.89473684 0.94736842 0.88888889 1. 0.72222222 0.94444444 0.88888889] mean value: 0.8970760233918128 key: train_accuracy value: [0.95180723 0.93373494 0.92771084 0.93373494 0.93975904 0.93413174 0.93413174 0.95209581 0.94011976 0.94610778] mean value: 0.9393333814299113 key: test_fscore value: [0.84615385 0.95652174 0.96 0.91666667 0.96 0.90909091 1. 0.8 0.95652174 0.91666667] mean value: 0.9221621566838958 key: train_fscore value: [0.96226415 0.94835681 0.94392523 0.94835681 0.95283019 0.94930876 0.94883721 0.96226415 0.95327103 0.95774648] mean value: 0.9527160811207689 key: test_precision value: [0.73333333 1. 0.92307692 0.91666667 0.92307692 0.90909091 1. 0.71428571 0.91666667 0.84615385] mean value: 0.8882350982350983 key: train_precision value: [0.93577982 0.90990991 0.90178571 0.90990991 0.91818182 0.90350877 0.91071429 0.93577982 0.91891892 0.92727273] mean value: 0.9171761689150632 key: test_recall value: [1. 0.91666667 1. 0.91666667 1. 0.90909091 1. 0.90909091 1. 1. ] mean value: 0.9651515151515151 key: train_recall value: [0.99029126 0.99019608 0.99019608 0.99019608 0.99019608 1. 0.99029126 0.99029126 0.99029126 0.99029126] mean value: 0.9912240624405102 key: test_roc_auc value: [0.75 0.95833333 0.92857143 0.88690476 0.92857143 0.88311688 1. 0.66883117 0.92857143 0.85714286] mean value: 0.879004329004329 key: train_roc_auc value: [0.93959008 0.91697304 0.90916054 0.91697304 0.92478554 0.9140625 0.91702063 0.94045813 0.92483313 0.93264563] mean value: 0.9236502256646996 key: test_jcc value: [0.73333333 0.91666667 0.92307692 0.84615385 0.92307692 0.83333333 1. 0.66666667 0.91666667 0.84615385] mean value: 0.8605128205128205 key: train_jcc value: [0.92727273 0.90178571 0.89380531 0.90178571 0.90990991 0.90350877 0.90265487 0.92727273 0.91071429 0.91891892] mean value: 0.9097628946580972 MCC on Blind test: 0.25 Accuracy on Blind test: 0.59 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00777102 0.00735879 0.00705314 0.00707221 0.00719404 0.00715971 0.00785017 0.00709414 0.00755572 0.00760627] mean value: 0.00737152099609375 key: score_time value: [0.00856996 0.00846505 0.00813699 0.00842476 0.00811648 0.00864673 0.00846338 0.00875568 0.00872493 0.00881457] mean value: 0.008511853218078614 key: test_mcc value: [ 0.21660006 0.32142857 0.23262105 0.28690229 0.28690229 0.43320011 0.16116459 -0.24029619 0.40291148 0.40291148] mean value: 0.2504345746462975 key: train_mcc value: [0.34619876 0.33098314 0.29538063 0.35569507 0.35404664 0.3240165 0.35981593 0.37214605 0.27958995 0.33041139] mean value: 0.3348284059138056 key: test_accuracy value: [0.63157895 0.68421053 0.63157895 0.68421053 0.68421053 0.72222222 0.61111111 0.44444444 0.72222222 0.72222222] mean value: 0.6538011695906433 key: train_accuracy value: [0.69879518 0.69277108 0.6746988 0.70481928 0.69879518 0.68862275 0.70658683 0.71257485 0.67065868 0.68862275] mean value: 0.6936945386335762 key: test_fscore value: [0.72 0.75 0.69565217 0.76923077 0.76923077 0.76190476 0.69565217 0.58333333 0.7826087 0.7826087 ] mean value: 0.7310221372830068 key: train_fscore value: [0.76635514 0.76497696 0.74766355 0.77625571 0.76190476 0.75925926 0.77625571 0.78181818 0.74885845 0.75471698] mean value: 0.7638064697242107 key: test_precision value: [0.64285714 0.75 0.72727273 0.71428571 0.71428571 0.8 0.66666667 0.53846154 0.75 0.75 ] mean value: 0.7053829503829504 key: train_precision value: [0.73873874 0.72173913 0.71428571 0.72649573 0.74074074 0.72566372 0.73275862 0.73504274 0.70689655 0.73394495] mean value: 0.7276306629094831 key: test_recall value: [0.81818182 0.75 0.66666667 0.83333333 0.83333333 0.72727273 0.72727273 0.63636364 0.81818182 0.81818182] mean value: 0.7628787878787879 key: train_recall value: [0.7961165 0.81372549 0.78431373 0.83333333 0.78431373 0.7961165 0.82524272 0.83495146 0.7961165 0.77669903] mean value: 0.8040928992956405 key: test_roc_auc value: [0.59659091 0.66071429 0.61904762 0.63095238 0.63095238 0.72077922 0.57792208 0.38961039 0.69480519 0.69480519] mean value: 0.6216179653679654 key: train_roc_auc value: [0.66789952 0.65686275 0.64215686 0.66666667 0.67340686 0.65587075 0.67043386 0.67528823 0.63243325 0.66178701] mean value: 0.6602805766319473 key: test_jcc value: [0.5625 0.6 0.53333333 0.625 0.625 0.61538462 0.53333333 0.41176471 0.64285714 0.64285714] mean value: 0.5792030273647921 key: train_jcc value: [0.62121212 0.61940299 0.59701493 0.63432836 0.61538462 0.6119403 0.63432836 0.64179104 0.59854015 0.60606061] mean value: 0.6180003458791998 MCC on Blind test: 0.51 Accuracy on Blind test: 0.74 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.07428098 0.04616737 0.04378724 0.04078841 0.05201912 0.16673613 0.0367384 0.03498602 0.03795505 0.03922486] mean value: 0.0572683572769165 key: score_time value: [0.0104301 0.0102849 0.01059461 0.01035333 0.01026797 0.01001692 0.00953746 0.00964141 0.00958657 0.00953507] mean value: 0.010024833679199218 key: test_mcc value: [0.56729535 0.88949918 0.89559105 1. 0.77380952 0.76623377 1. 0.39594419 0.88640526 0.66254135] mean value: 0.7837319665122223 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.78947368 0.94736842 0.94736842 1. 0.89473684 0.88888889 1. 0.72222222 0.94444444 0.83333333] mean value: 0.8967836257309941 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.83333333 0.96 0.95652174 1. 0.91666667 0.90909091 1. 0.8 0.95652174 0.85714286] mean value: 0.9189277244494636 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.76923077 0.92307692 1. 1. 0.91666667 0.90909091 1. 0.71428571 0.91666667 0.9 ] mean value: 0.9049017649017649 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 1. 0.91666667 1. 0.91666667 0.90909091 1. 0.90909091 1. 0.81818182] mean value: 0.9378787878787879 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.76704545 0.92857143 0.95833333 1. 0.88690476 0.88311688 1. 0.66883117 0.92857143 0.83766234] mean value: 0.8859036796536797 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.71428571 0.92307692 0.91666667 1. 0.84615385 0.83333333 1. 0.66666667 0.91666667 0.75 ] mean value: 0.8566849816849816 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.07 Accuracy on Blind test: 0.52 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01267529 0.01247311 0.01756597 0.03184366 0.03098655 0.03128266 0.03108382 0.03088689 0.03076506 0.03127789] mean value: 0.026084089279174806 key: score_time value: [0.01049232 0.01059413 0.02069259 0.01077557 0.01968479 0.0106771 0.02043557 0.01927018 0.01058149 0.02066064] mean value: 0.015386438369750977 key: test_mcc value: [0.45361105 0.67460105 0.88949918 0.89559105 0.89559105 0.89188259 0.79772404 0.53246753 0.56061191 0.56980288] mean value: 0.7161382335698945 key: train_mcc value: [0.92325474 0.82122399 0.84675102 0.83387364 0.84675102 0.84729198 0.87296284 0.86004923 0.89835373 0.86032048] mean value: 0.8610832667133086 key: test_accuracy value: [0.73684211 0.84210526 0.94736842 0.94736842 0.94736842 0.94444444 0.88888889 0.77777778 0.77777778 0.77777778] mean value: 0.8587719298245614 key: train_accuracy value: [0.96385542 0.91566265 0.92771084 0.92168675 0.92771084 0.92814371 0.94011976 0.93413174 0.95209581 0.93413174] mean value: 0.9345249260515114 key: test_fscore value: [0.7826087 0.86956522 0.96 0.95652174 0.95652174 0.95238095 0.9 0.81818182 0.84615385 0.8 ] mean value: 0.8841934008020964 key: train_fscore value: [0.97087379 0.93203883 0.94230769 0.93719807 0.94230769 0.94285714 0.95238095 0.94736842 0.96153846 0.9468599 ] mean value: 0.9475730954818289 key: test_precision value: [0.75 0.90909091 0.92307692 1. 1. 1. 1. 0.81818182 0.73333333 0.88888889] mean value: 0.9022571872571873 key: train_precision value: [0.97087379 0.92307692 0.9245283 0.92380952 0.9245283 0.92523364 0.93457944 0.93396226 0.95238095 0.94230769] mean value: 0.9355280830019537 key: test_recall value: [0.81818182 0.83333333 1. 0.91666667 0.91666667 0.90909091 0.81818182 0.81818182 1. 0.72727273] mean value: 0.8757575757575757 key: train_recall value: [0.97087379 0.94117647 0.96078431 0.95098039 0.96078431 0.96116505 0.97087379 0.96116505 0.97087379 0.95145631] mean value: 0.960013325718637 key: test_roc_auc value: [0.72159091 0.8452381 0.92857143 0.95833333 0.95833333 0.95454545 0.90909091 0.76623377 0.71428571 0.79220779] mean value: 0.8548430735930737 key: train_roc_auc value: [0.96162737 0.90808824 0.91789216 0.9129902 0.91789216 0.91808252 0.93074939 0.92589502 0.94637439 0.92885316] mean value: 0.9268444604783661 key: test_jcc value: [0.64285714 0.76923077 0.92307692 0.91666667 0.91666667 0.90909091 0.81818182 0.69230769 0.73333333 0.66666667] mean value: 0.7988078588078588 key: train_jcc value: [0.94339623 0.87272727 0.89090909 0.88181818 0.89090909 0.89189189 0.90909091 0.9 0.92592593 0.89908257] mean value: 0.9005751158494797 MCC on Blind test: 0.09 Accuracy on Blind test: 0.54 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.00939846 0.00705338 0.00695395 0.00734782 0.00737906 0.00728893 0.00739002 0.00729275 0.00736165 0.00728703] mean value: 0.00747530460357666 key: score_time value: [0.01311707 0.00814056 0.00845981 0.00825524 0.00828815 0.00828743 0.00835443 0.00829411 0.00827003 0.00832057] mean value: 0.008778738975524902 key: test_mcc value: [0.60553007 0.32142857 0.14085904 0.28690229 0.26772484 0.52299758 0.44320263 0.0805823 0.0805823 0.56061191] mean value: 0.33104215300057266 key: train_mcc value: [0.34161624 0.39993512 0.3929602 0.3794614 0.42213076 0.39858139 0.42337541 0.42542126 0.32037061 0.3808643 ] mean value: 0.3884716694211574 key: test_accuracy value: [0.78947368 0.68421053 0.63157895 0.68421053 0.68421053 0.77777778 0.72222222 0.61111111 0.61111111 0.77777778] mean value: 0.6973684210526316 key: train_accuracy value: [0.70481928 0.72289157 0.72289157 0.71686747 0.73493976 0.7245509 0.73652695 0.73652695 0.69461078 0.71856287] mean value: 0.7213188081667989 key: test_fscore value: [0.84615385 0.75 0.74074074 0.76923077 0.78571429 0.83333333 0.81481481 0.74074074 0.74074074 0.84615385] mean value: 0.7867623117623117 key: train_fscore value: [0.79324895 0.80672269 0.79824561 0.79828326 0.8018018 0.80672269 0.80869565 0.81196581 0.78297872 0.79295154] mean value: 0.8001616730332606 key: test_precision value: [0.73333333 0.75 0.66666667 0.71428571 0.6875 0.76923077 0.6875 0.625 0.625 0.73333333] mean value: 0.6991849816849817 key: train_precision value: [0.70149254 0.70588235 0.72222222 0.70992366 0.74166667 0.71111111 0.73228346 0.72519084 0.6969697 0.72580645] mean value: 0.7172549007220933 key: test_recall value: [1. 0.75 0.83333333 0.83333333 0.91666667 0.90909091 1. 0.90909091 0.90909091 1. ] mean value: 0.906060606060606 key: train_recall value: [0.91262136 0.94117647 0.89215686 0.91176471 0.87254902 0.93203883 0.90291262 0.9223301 0.89320388 0.87378641] mean value: 0.9054540262707025 key: test_roc_auc value: [0.75 0.66071429 0.55952381 0.63095238 0.60119048 0.74025974 0.64285714 0.52597403 0.52597403 0.71428571] mean value: 0.6351731601731602 key: train_roc_auc value: [0.63885036 0.65808824 0.67264093 0.65900735 0.69408701 0.66133192 0.68583131 0.67991505 0.63410194 0.6712682 ] mean value: 0.6655122313893195 key: test_jcc value: [0.73333333 0.6 0.58823529 0.625 0.64705882 0.71428571 0.6875 0.58823529 0.58823529 0.73333333] mean value: 0.6505217086834734 key: train_jcc value: [0.65734266 0.67605634 0.66423358 0.66428571 0.66917293 0.67605634 0.67883212 0.68345324 0.64335664 0.65693431] mean value: 0.6669723860782252 MCC on Blind test: 0.51 Accuracy on Blind test: 0.73 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00765204 0.00983834 0.00933051 0.01027274 0.00990605 0.00989223 0.01009798 0.01009941 0.01041555 0.01013994] mean value: 0.009764480590820312 key: score_time value: [0.00810671 0.00978684 0.00992608 0.0102284 0.01031661 0.01037884 0.01027703 0.01031566 0.01036739 0.01031637] mean value: 0.01000199317932129 key: test_mcc value: [0.33796318 0.54761905 0.65477023 0.7824608 0.80507649 0.76623377 0.28203804 0.34188173 0.44320263 0.52299758] mean value: 0.548424349224469 key: train_mcc value: [0.88657784 0.85954556 0.72631812 0.76988112 0.84858071 0.83737341 0.56743022 0.76293969 0.77046864 0.79393863] mean value: 0.7823053934321447 key: test_accuracy value: [0.68421053 0.78947368 0.84210526 0.89473684 0.89473684 0.88888889 0.5 0.66666667 0.72222222 0.77777778] mean value: 0.7660818713450293 key: train_accuracy value: [0.94578313 0.93373494 0.86746988 0.88554217 0.92771084 0.92215569 0.7245509 0.8742515 0.88622754 0.89820359] mean value: 0.8865630185412308 key: test_fscore value: [0.75 0.83333333 0.88 0.92307692 0.90909091 0.90909091 0.30769231 0.7 0.81481481 0.83333333] mean value: 0.7860432530432531 key: train_fscore value: [0.95566502 0.9468599 0.9009009 0.91479821 0.94059406 0.93596059 0.7125 0.88888889 0.91555556 0.92376682] mean value: 0.9035489946317999 key: test_precision value: [0.69230769 0.83333333 0.84615385 0.85714286 1. 0.90909091 1. 0.77777778 0.6875 0.76923077] mean value: 0.8372537185037185 key: train_precision value: [0.97 0.93333333 0.83333333 0.84297521 0.95 0.95 1. 0.97674419 0.8442623 0.85833333] mean value: 0.9158981687740049 key: test_recall value: [0.81818182 0.83333333 0.91666667 1. 0.83333333 0.90909091 0.18181818 0.63636364 1. 0.90909091] mean value: 0.8037878787878788 key: train_recall value: [0.94174757 0.96078431 0.98039216 1. 0.93137255 0.9223301 0.55339806 0.81553398 1. 1. ] mean value: 0.9105558728345707 key: test_roc_auc value: [0.65909091 0.77380952 0.81547619 0.85714286 0.91666667 0.88311688 0.59090909 0.67532468 0.64285714 0.74025974] mean value: 0.755465367965368 key: train_roc_auc value: [0.94706426 0.92570466 0.83394608 0.8515625 0.92662377 0.92210255 0.77669903 0.89214199 0.8515625 0.8671875 ] mean value: 0.879459484036333 key: test_jcc value: [0.6 0.71428571 0.78571429 0.85714286 0.83333333 0.83333333 0.18181818 0.53846154 0.6875 0.71428571] mean value: 0.6745874958374959 key: train_jcc value: [0.91509434 0.89908257 0.81967213 0.84297521 0.88785047 0.87962963 0.55339806 0.8 0.8442623 0.85833333] mean value: 0.830029802977617 MCC on Blind test: 0.13 Accuracy on Blind test: 0.55 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01018405 0.01003146 0.00991631 0.01017022 0.01003242 0.01013088 0.00956511 0.00980854 0.01098204 0.01009631] mean value: 0.010091733932495118 key: score_time value: [0.01030588 0.01020026 0.0102222 0.01032352 0.01022243 0.01029515 0.01032782 0.01033735 0.01036525 0.01023245] mean value: 0.010283231735229492 key: test_mcc value: [0.5077524 0.51887452 0.72456884 0.80507649 0.6761234 0.66254135 0.1934765 0.44320263 0.67005939 0.43320011] mean value: 0.5634875632497535 key: train_mcc value: [0.73618348 0.59399514 0.70269787 0.86061598 0.60495638 0.88573143 0.29075534 0.82931725 0.66982421 0.82396818] mean value: 0.6998045257344441 key: test_accuracy value: [0.73684211 0.68421053 0.84210526 0.89473684 0.84210526 0.83333333 0.44444444 0.72222222 0.83333333 0.72222222] mean value: 0.7555555555555555 key: train_accuracy value: [0.87349398 0.75301205 0.84337349 0.93373494 0.80120482 0.94610778 0.50299401 0.91616766 0.80838323 0.91616766] mean value: 0.8294639636389871 key: test_fscore value: [0.81481481 0.66666667 0.85714286 0.90909091 0.88888889 0.85714286 0.16666667 0.81481481 0.88 0.76190476] mean value: 0.7617133237133237 key: train_fscore value: [0.9058296 0.75151515 0.86021505 0.94581281 0.86075949 0.95652174 0.32520325 0.93636364 0.81818182 0.93137255] mean value: 0.8291775097971825 key: test_precision value: [0.6875 1. 1. 1. 0.8 0.9 1. 0.6875 0.78571429 0.8 ] mean value: 0.8660714285714286 key: train_precision value: [0.84166667 0.98412698 0.95238095 0.95049505 0.75555556 0.95192308 1. 0.88034188 0.98630137 0.94059406] mean value: 0.924338559476902 key: test_recall value: [1. 0.5 0.75 0.83333333 1. 0.81818182 0.09090909 1. 1. 0.72727273] mean value: 0.771969696969697 key: train_recall value: [0.98058252 0.60784314 0.78431373 0.94117647 1. 0.96116505 0.19417476 1. 0.69902913 0.9223301 ] mean value: 0.8090614886731392 key: test_roc_auc value: [0.6875 0.75 0.875 0.91666667 0.78571429 0.83766234 0.54545455 0.64285714 0.78571429 0.72077922] mean value: 0.7547348484848485 key: train_roc_auc value: [0.83949761 0.79610907 0.86090686 0.93152574 0.7421875 0.94152002 0.59708738 0.890625 0.84170206 0.91429005] mean value: 0.8355451292572045 key: test_jcc value: [0.6875 0.5 0.75 0.83333333 0.8 0.75 0.09090909 0.6875 0.78571429 0.61538462] mean value: 0.6500341325341326 key: train_jcc value: [0.82786885 0.60194175 0.75471698 0.89719626 0.75555556 0.91666667 0.19417476 0.88034188 0.69230769 0.87155963] mean value: 0.7392330028027021 MCC on Blind test: 0.22 Accuracy on Blind test: 0.57 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.08395481 0.0697968 0.07071424 0.07110596 0.07078338 0.07142019 0.07246375 0.07124639 0.07080793 0.07235765] mean value: 0.07246510982513428 key: score_time value: [0.0151608 0.01497865 0.01519632 0.01489806 0.01543808 0.01554489 0.01553178 0.01531959 0.01536942 0.01543546] mean value: 0.015287303924560547 key: test_mcc value: [0.60553007 0.54761905 1. 0.89559105 0.67460105 0.76623377 0.79772404 0.52299758 0.88640526 0.48416483] mean value: 0.7180866701858623 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.78947368 0.78947368 1. 0.94736842 0.84210526 0.88888889 0.88888889 0.77777778 0.94444444 0.72222222] mean value: 0.8590643274853801 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.84615385 0.83333333 1. 0.95652174 0.86956522 0.90909091 0.9 0.83333333 0.95652174 0.73684211] mean value: 0.8841362222826754 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.73333333 0.83333333 1. 1. 0.90909091 0.90909091 1. 0.76923077 0.91666667 0.875 ] mean value: 0.8945745920745921 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.83333333 1. 0.91666667 0.83333333 0.90909091 0.81818182 0.90909091 1. 0.63636364] mean value: 0.8856060606060606 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75 0.77380952 1. 0.95833333 0.8452381 0.88311688 0.90909091 0.74025974 0.92857143 0.74675325] mean value: 0.8535173160173161 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.73333333 0.71428571 1. 0.91666667 0.76923077 0.83333333 0.81818182 0.71428571 0.91666667 0.58333333] mean value: 0.799931734931735 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: -0.0 Accuracy on Blind test: 0.5 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.02634501 0.02745032 0.03282142 0.02707553 0.03081536 0.03036213 0.03100204 0.03243303 0.0233736 0.02885413] mean value: 0.029053258895874023 key: score_time value: [0.01942372 0.01645255 0.02343774 0.01584172 0.02157259 0.01987481 0.02784896 0.02355909 0.01557946 0.0171895 ] mean value: 0.020078015327453614 key: test_mcc value: [0.56729535 0.89559105 0.89559105 1. 0.67460105 0.76623377 0.79772404 0.56061191 0.88640526 0.64465837] mean value: 0.768871184699948 key: train_mcc value: [1. 0.97457108 0.97457108 1. 0.98740179 0.98737524 0.96301704 0.97466626 0.98744925 0.94933931] mean value: 0.9798391039351805 key: test_accuracy value: [0.78947368 0.94736842 0.94736842 1. 0.84210526 0.88888889 0.88888889 0.77777778 0.94444444 0.83333333] mean value: 0.8859649122807017 key: train_accuracy value: [1. 0.98795181 0.98795181 1. 0.9939759 0.99401198 0.98203593 0.98802395 0.99401198 0.9760479 ] mean value: 0.9904011254599235 key: test_fscore value: [0.83333333 0.95652174 0.95652174 1. 0.86956522 0.90909091 0.9 0.84615385 0.95652174 0.86956522] mean value: 0.9097273740752001 key: train_fscore value: [1. 0.99019608 0.99019608 1. 0.99507389 0.99516908 0.98522167 0.99029126 0.99512195 0.98076923] mean value: 0.9922039249615477 key: test_precision value: [0.76923077 1. 1. 1. 0.90909091 0.90909091 1. 0.73333333 0.91666667 0.83333333] mean value: 0.9070745920745921 key: train_precision value: [1. 0.99019608 0.99019608 1. 1. 0.99038462 1. 0.99029126 1. 0.97142857] mean value: 0.9932496605811855 key: test_recall value: [0.90909091 0.91666667 0.91666667 1. 0.83333333 0.90909091 0.81818182 1. 1. 0.90909091] mean value: 0.9212121212121211 key: train_recall value: [1. 0.99019608 0.99019608 1. 0.99019608 1. 0.97087379 0.99029126 0.99029126 0.99029126] mean value: 0.9912335808109651 key: test_roc_auc value: [0.76704545 0.95833333 0.95833333 1. 0.8452381 0.88311688 0.90909091 0.71428571 0.92857143 0.81168831] mean value: 0.8775703463203464 key: train_roc_auc value: [1. 0.98728554 0.98728554 1. 0.99509804 0.9921875 0.98543689 0.98733313 0.99514563 0.97170813] mean value: 0.9901480404054825 key: test_jcc value: [0.71428571 0.91666667 0.91666667 1. 0.76923077 0.83333333 0.81818182 0.73333333 0.91666667 0.76923077] mean value: 0.8387595737595738 key: train_jcc value: [1. 0.98058252 0.98058252 1. 0.99019608 0.99038462 0.97087379 0.98076923 0.99029126 0.96226415] mean value: 0.9845944172615994 MCC on Blind test: 0.1 Accuracy on Blind test: 0.54 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.01885796 0.01930904 0.02562237 0.02129722 0.06075287 0.03211212 0.04384661 0.03204489 0.0758667 0.05150294] mean value: 0.038121271133422854 key: score_time value: [0.01133871 0.01133037 0.0113616 0.02015972 0.02054238 0.01122904 0.011343 0.01591635 0.02104354 0.01130295] mean value: 0.01455676555633545 key: test_mcc value: [ 0.40219983 0.26772484 0.28690229 0.18531233 0.44908871 0.2548236 0.39594419 -0.05096472 0.3040345 0.67005939] mean value: 0.31651249570546197 key: train_mcc value: [0.88606149 0.90075726 0.87457979 0.88685769 0.92515014 0.91320801 0.89953068 0.91188694 0.87498674 0.94997541] mean value: 0.9022994142722148 key: test_accuracy value: [0.68421053 0.68421053 0.68421053 0.63157895 0.73684211 0.66666667 0.72222222 0.55555556 0.66666667 0.83333333] mean value: 0.6865497076023391 key: train_accuracy value: [0.94578313 0.95180723 0.93975904 0.94578313 0.96385542 0.95808383 0.95209581 0.95808383 0.94011976 0.9760479 ] mean value: 0.953141908953178 key: test_fscore value: [0.78571429 0.78571429 0.76923077 0.72 0.82758621 0.76923077 0.8 0.69230769 0.78571429 0.88 ] mean value: 0.7815498294808639 key: train_fscore value: [0.95774648 0.96226415 0.95283019 0.95734597 0.97142857 0.96713615 0.96226415 0.96682464 0.95327103 0.98095238] mean value: 0.9632063716206098 key: test_precision value: [0.64705882 0.6875 0.71428571 0.69230769 0.70588235 0.66666667 0.71428571 0.6 0.64705882 0.78571429] mean value: 0.6860760073260074 key: train_precision value: [0.92727273 0.92727273 0.91818182 0.9266055 0.94444444 0.93636364 0.93577982 0.94444444 0.91891892 0.96261682] mean value: 0.9341900860429541 key: test_recall value: [1. 0.91666667 0.83333333 0.75 1. 0.90909091 0.90909091 0.81818182 1. 1. ] mean value: 0.9136363636363636 key: train_recall value: [0.99029126 1. 0.99019608 0.99019608 1. 1. 0.99029126 0.99029126 0.99029126 1. ] mean value: 0.9941557205406435 key: test_roc_auc value: [0.625 0.60119048 0.63095238 0.58928571 0.64285714 0.5974026 0.66883117 0.48051948 0.57142857 0.78571429] mean value: 0.6193181818181818 key: train_roc_auc value: [0.93165357 0.9375 0.92478554 0.93259804 0.953125 0.9453125 0.94045813 0.94827063 0.92483313 0.96875 ] mean value: 0.9407286539211154 key: test_jcc value: [0.64705882 0.64705882 0.625 0.5625 0.70588235 0.625 0.66666667 0.52941176 0.64705882 0.78571429] mean value: 0.6441351540616247 key: train_jcc value: [0.91891892 0.92727273 0.90990991 0.91818182 0.94444444 0.93636364 0.92727273 0.93577982 0.91071429 0.96261682] mean value: 0.9291475107022136 MCC on Blind test: 0.33 Accuracy on Blind test: 0.64 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.12610126 0.12060261 0.11894894 0.11790848 0.12793779 0.12083268 0.12258196 0.12219334 0.12002635 0.11419153] mean value: 0.121132493019104 key: score_time value: [0.00943565 0.00874519 0.00873065 0.00975442 0.00927114 0.00965858 0.00981712 0.01030612 0.00883412 0.00869703] mean value: 0.009325003623962403 key: test_mcc value: [0.45361105 1. 0.80507649 1. 0.88949918 0.76623377 1. 0.56061191 0.88640526 0.64465837] mean value: 0.8006096026925367 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73684211 1. 0.89473684 1. 0.94736842 0.88888889 1. 0.77777778 0.94444444 0.83333333] mean value: 0.9023391812865497 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.7826087 1. 0.90909091 1. 0.96 0.90909091 1. 0.84615385 0.95652174 0.86956522] mean value: 0.9233031316509577 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 1. 1. 1. 0.92307692 0.90909091 1. 0.73333333 0.91666667 0.83333333] mean value: 0.9065501165501165 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 1. 0.83333333 1. 1. 0.90909091 1. 1. 1. 0.90909091] mean value: 0.946969696969697 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.72159091 1. 0.91666667 1. 0.92857143 0.88311688 1. 0.71428571 0.92857143 0.81168831] mean value: 0.8904491341991343 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.64285714 1. 0.83333333 1. 0.92307692 0.83333333 1. 0.73333333 0.91666667 0.76923077] mean value: 0.8651831501831502 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.14 Accuracy on Blind test: 0.55 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.02101755 0.03171253 0.02527022 0.01208258 0.01173353 0.01204896 0.01371312 0.01347113 0.01202703 0.01235557] mean value: 0.01654322147369385 key: score_time value: [0.01127648 0.01125073 0.01176476 0.01200485 0.01166868 0.01086617 0.01114273 0.01123142 0.0110383 0.01096892] mean value: 0.011321306228637695 key: test_mcc value: [0.4719399 0.40849122 0.09356015 0.44908871 0.56694671 0.26856633 0.44320263 0.0805823 0.0805823 0.66254135] mean value: 0.3525501597195837 key: train_mcc value: [0.6002326 0.50998847 0.67610805 0.54823412 0.49142346 0.64107028 0.54903745 0.61519707 0.55309666 0.78305013] mean value: 0.5967438289020631 key: test_accuracy value: [0.73684211 0.73684211 0.63157895 0.73684211 0.78947368 0.66666667 0.72222222 0.61111111 0.61111111 0.83333333] mean value: 0.7076023391812866 key: train_accuracy value: [0.81325301 0.75903614 0.8373494 0.77710843 0.75903614 0.83233533 0.77844311 0.82035928 0.79041916 0.89820359] mean value: 0.8065543611572037 key: test_fscore value: [0.76190476 0.81481481 0.75862069 0.82758621 0.85714286 0.75 0.81481481 0.74074074 0.74074074 0.85714286] mean value: 0.7923508483853311 key: train_fscore value: [0.85167464 0.83471074 0.88311688 0.84518828 0.83050847 0.87272727 0.84647303 0.86486486 0.83253589 0.91943128] mean value: 0.858123135858806 key: test_precision value: [0.8 0.73333333 0.64705882 0.70588235 0.75 0.69230769 0.6875 0.625 0.625 0.9 ] mean value: 0.7166082202111614 key: train_precision value: [0.83962264 0.72142857 0.79069767 0.73722628 0.73134328 0.82051282 0.73913043 0.80672269 0.82075472 0.89814815] mean value: 0.7905587257811302 key: test_recall value: [0.72727273 0.91666667 0.91666667 1. 1. 0.81818182 1. 0.90909091 0.90909091 0.81818182] mean value: 0.9015151515151515 key: train_recall value: [0.86407767 0.99019608 1. 0.99019608 0.96078431 0.93203883 0.99029126 0.93203883 0.84466019 0.94174757] mean value: 0.9446030839520274 key: test_roc_auc value: [0.73863636 0.67261905 0.5297619 0.64285714 0.71428571 0.62337662 0.64285714 0.52597403 0.52597403 0.83766234] mean value: 0.6454004329004329 key: train_roc_auc value: [0.7971182 0.69041054 0.7890625 0.71384804 0.69914216 0.80195692 0.71389563 0.78633192 0.7738926 0.88493629] mean value: 0.7650594784839503 key: test_jcc value: [0.61538462 0.6875 0.61111111 0.70588235 0.75 0.6 0.6875 0.58823529 0.58823529 0.75 ] mean value: 0.6583848667672197 key: train_jcc value: [0.74166667 0.71631206 0.79069767 0.73188406 0.71014493 0.77419355 0.73381295 0.76190476 0.71311475 0.85087719] mean value: 0.7524608590343069 MCC on Blind test: 0.32 Accuracy on Blind test: 0.61 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.0152986 0.01060939 0.01029634 0.01028705 0.01044655 0.01044083 0.01033711 0.01052427 0.01060605 0.01032233] mean value: 0.010916852951049804 key: score_time value: [0.01142406 0.01068401 0.01064181 0.01059437 0.01238728 0.01085711 0.01055193 0.0106585 0.01086307 0.01084256] mean value: 0.010950469970703125 key: test_mcc value: [0.21660006 0.67460105 0.77380952 0.80507649 0.89559105 0.76623377 0.79772404 0.67005939 0.56061191 0.66254135] mean value: 0.6822848626279371 key: train_mcc value: [0.92308458 0.85954556 0.88685769 0.88521749 0.83387364 0.89863369 0.84736815 0.87286094 0.89835373 0.88573143] mean value: 0.8791526890998981 key: test_accuracy value: [0.63157895 0.84210526 0.89473684 0.89473684 0.94736842 0.88888889 0.88888889 0.83333333 0.77777778 0.83333333] mean value: 0.8432748538011696 key: train_accuracy value: [0.96385542 0.93373494 0.94578313 0.94578313 0.92168675 0.95209581 0.92814371 0.94011976 0.95209581 0.94610778] mean value: 0.9429406247745473 key: test_fscore value: [0.72 0.86956522 0.91666667 0.90909091 0.95652174 0.90909091 0.9 0.88 0.84615385 0.85714286] mean value: 0.8764232144666927 key: train_fscore value: [0.97115385 0.9468599 0.95734597 0.95652174 0.93719807 0.96190476 0.94230769 0.95192308 0.96153846 0.95652174] mean value: 0.9543275259667182 key: test_precision value: [0.64285714 0.90909091 0.91666667 1. 1. 0.90909091 1. 0.78571429 0.73333333 0.9 ] mean value: 0.8796753246753246 key: train_precision value: [0.96190476 0.93333333 0.9266055 0.94285714 0.92380952 0.94392523 0.93333333 0.94285714 0.95238095 0.95192308] mean value: 0.9412930005631284 key: test_recall value: [0.81818182 0.83333333 0.91666667 0.83333333 0.91666667 0.90909091 0.81818182 1. 1. 0.81818182] mean value: 0.8863636363636364 key: train_recall value: [0.98058252 0.96078431 0.99019608 0.97058824 0.95098039 0.98058252 0.95145631 0.96116505 0.97087379 0.96116505] mean value: 0.967837426232629 key: test_roc_auc value: [0.59659091 0.8452381 0.88690476 0.91666667 0.95833333 0.88311688 0.90909091 0.78571429 0.71428571 0.83766234] mean value: 0.8333603896103896 key: train_roc_auc value: [0.95854523 0.92570466 0.93259804 0.93841912 0.9129902 0.94341626 0.92104066 0.93370752 0.94637439 0.94152002] mean value: 0.9354316099417114 key: test_jcc value: [0.5625 0.76923077 0.84615385 0.83333333 0.91666667 0.83333333 0.81818182 0.78571429 0.73333333 0.75 ] mean value: 0.7848447385947386 key: train_jcc value: [0.94392523 0.89908257 0.91818182 0.91666667 0.88181818 0.9266055 0.89090909 0.90825688 0.92592593 0.91666667] mean value: 0.9128038537941651 MCC on Blind test: 0.21 Accuracy on Blind test: 0.59 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:122: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:125: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.0855453 0.08357239 0.08917785 0.08271241 0.08259749 0.08286834 0.08288574 0.08441114 0.08273721 0.08266902] mean value: 0.08391768932342529 key: score_time value: [0.01079178 0.01085567 0.01092529 0.01075959 0.01109052 0.01090479 0.0108192 0.01083088 0.01145196 0.01102829] mean value: 0.010945796966552734 key: test_mcc value: [0.21660006 0.67460105 0.77380952 0.89559105 0.89559105 0.76623377 0.79772404 0.67005939 0.56061191 0.56980288] mean value: 0.6820624726317762 key: train_mcc value: [0.92308458 0.85980258 0.88685769 0.83387364 0.83400835 0.89863369 0.87296284 0.87286094 0.89835373 0.86032048] mean value: 0.8740758514702531 key: test_accuracy value: [0.63157895 0.84210526 0.89473684 0.94736842 0.94736842 0.88888889 0.88888889 0.83333333 0.77777778 0.77777778] mean value: 0.8429824561403508 key: train_accuracy value: [0.96385542 0.93373494 0.94578313 0.92168675 0.92168675 0.95209581 0.94011976 0.94011976 0.95209581 0.93413174] mean value: 0.9405309862203304 key: test_fscore value: [0.72 0.86956522 0.91666667 0.95652174 0.95652174 0.90909091 0.9 0.88 0.84615385 0.8 ] mean value: 0.8754520117563596 key: train_fscore value: [0.97115385 0.94634146 0.95734597 0.93719807 0.93779904 0.96190476 0.95238095 0.95192308 0.96153846 0.9468599 ] mean value: 0.9524445547956408 key: test_precision value: [0.64285714 0.90909091 0.91666667 1. 1. 0.90909091 1. 0.78571429 0.73333333 0.88888889] mean value: 0.8785642135642135 key: train_precision value: [0.96190476 0.94174757 0.9266055 0.92380952 0.91588785 0.94392523 0.93457944 0.94285714 0.95238095 0.94230769] mean value: 0.9386005674027249 key: test_recall value: [0.81818182 0.83333333 0.91666667 0.91666667 0.91666667 0.90909091 0.81818182 1. 1. 0.72727273] mean value: 0.8856060606060606 key: train_recall value: [0.98058252 0.95098039 0.99019608 0.95098039 0.96078431 0.98058252 0.97087379 0.96116505 0.97087379 0.95145631] mean value: 0.9668475157053112 key: test_roc_auc value: [0.59659091 0.8452381 0.88690476 0.95833333 0.95833333 0.88311688 0.90909091 0.78571429 0.71428571 0.79220779] mean value: 0.8329816017316017 key: train_roc_auc value: [0.95854523 0.9286152 0.93259804 0.9129902 0.91007966 0.94341626 0.93074939 0.93370752 0.94637439 0.92885316] mean value: 0.9325929046780524 key: test_jcc value: [0.5625 0.76923077 0.84615385 0.91666667 0.91666667 0.83333333 0.81818182 0.78571429 0.73333333 0.66666667] mean value: 0.7848447385947386 key: train_jcc value: [0.94392523 0.89814815 0.91818182 0.88181818 0.88288288 0.9266055 0.90909091 0.90825688 0.92592593 0.89908257] mean value: 0.9093918053821166 MCC on Blind test: 0.1 Accuracy on Blind test: 0.54 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.02180219 0.02042246 0.01844001 0.01872444 0.01910233 0.01665592 0.0174613 0.01886535 0.02107906 0.0179646 ] mean value: 0.019051766395568846 key: score_time value: [0.01086044 0.01106286 0.01095486 0.01095009 0.01091909 0.01115131 0.01068163 0.010638 0.01093936 0.01097393] mean value: 0.01091315746307373 key: test_mcc value: [0.58002308 0.48856385 0.41096386 0.56490196 0.74242424 0.74047959 0.82575758 0.91666667 0.83205029 0.63636364] mean value: 0.6738194748889874 key: train_mcc value: [0.80500813 0.77565201 0.79548704 0.77563066 0.7469525 0.76601619 0.76597166 0.76597166 0.73817726 0.81557242] mean value: 0.7750439506475604 key: test_accuracy value: [0.7826087 0.73913043 0.69565217 0.7826087 0.86956522 0.86956522 0.91304348 0.95652174 0.90909091 0.81818182] mean value: 0.833596837944664 key: train_accuracy value: [0.90243902 0.88780488 0.89756098 0.88780488 0.87317073 0.88292683 0.88292683 0.88292683 0.86893204 0.90776699] mean value: 0.887426000473597 key: test_fscore value: [0.73684211 0.75 0.72 0.76190476 0.86956522 0.88 0.91666667 0.95652174 0.91666667 0.81818182] mean value: 0.832634897520481 key: train_fscore value: [0.90384615 0.88780488 0.89655172 0.88888889 0.875 0.88349515 0.88118812 0.88118812 0.86699507 0.90731707] mean value: 0.8872275175238942 key: test_precision value: [0.875 0.69230769 0.64285714 0.8 0.90909091 0.84615385 0.91666667 1. 0.84615385 0.81818182] mean value: 0.8346411921411921 key: train_precision value: [0.8952381 0.89215686 0.91 0.88461538 0.85849057 0.875 0.89 0.89 0.88 0.91176471] mean value: 0.8887265614518667 key: test_recall value: [0.63636364 0.81818182 0.81818182 0.72727273 0.83333333 0.91666667 0.91666667 0.91666667 1. 0.81818182] mean value: 0.8401515151515152 key: train_recall value: [0.91262136 0.88349515 0.88349515 0.89320388 0.89215686 0.89215686 0.87254902 0.87254902 0.85436893 0.90291262] mean value: 0.8859508852084523 key: test_roc_auc value: [0.77651515 0.74242424 0.70075758 0.78030303 0.87121212 0.86742424 0.91287879 0.95833333 0.90909091 0.81818182] mean value: 0.8337121212121211 key: train_roc_auc value: [0.90238911 0.887826 0.89762993 0.88777841 0.8732629 0.88297164 0.88287645 0.88287645 0.86893204 0.90776699] mean value: 0.8874309918142015 key: test_jcc value: [0.58333333 0.6 0.5625 0.61538462 0.76923077 0.78571429 0.84615385 0.91666667 0.84615385 0.69230769] mean value: 0.7217445054945055 key: train_jcc value: [0.8245614 0.79824561 0.8125 0.8 0.77777778 0.79130435 0.78761062 0.78761062 0.76521739 0.83035714] mean value: 0.7975184916247269 MCC on Blind test: 0.35 Accuracy on Blind test: 0.67 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.70506549 0.62887144 0.6596272 0.79791594 0.68545508 0.64333749 0.76110959 0.69955564 0.68501639 0.77351475] mean value: 0.7039469003677368 key: score_time value: [0.01389217 0.01472902 0.01154828 0.01132441 0.01137733 0.01125288 0.02320051 0.01134515 0.01451206 0.01475286] mean value: 0.013793468475341797 key: test_mcc value: [0.76277007 0.66414149 0.48856385 0.74047959 0.74242424 0.82575758 0.65151515 0.58930667 0.68313005 0.83205029] mean value: 0.6980138984393582 key: train_mcc value: [0.92211753 0.97077583 0.93174679 0.87320324 0.91259644 0.91259644 0.88292404 0.94163576 1. 1. ] mean value: 0.9347596066723304 key: test_accuracy value: [0.86956522 0.82608696 0.73913043 0.86956522 0.86956522 0.91304348 0.82608696 0.7826087 0.81818182 0.90909091] mean value: 0.8422924901185771 key: train_accuracy value: [0.96097561 0.98536585 0.96585366 0.93658537 0.95609756 0.95609756 0.94146341 0.97073171 1. 1. ] mean value: 0.9673170731707317 key: test_fscore value: [0.84210526 0.83333333 0.75 0.85714286 0.86956522 0.91666667 0.83333333 0.76190476 0.77777778 0.91666667] mean value: 0.8358495877374597 key: train_fscore value: [0.96153846 0.98550725 0.96618357 0.93719807 0.95652174 0.95652174 0.94117647 0.97029703 1. 1. ] mean value: 0.9674944328979426 key: test_precision value: [1. 0.76923077 0.69230769 0.9 0.90909091 0.91666667 0.83333333 0.88888889 1. 0.84615385] mean value: 0.8755672105672105 key: train_precision value: [0.95238095 0.98076923 0.96153846 0.93269231 0.94285714 0.94285714 0.94117647 0.98 1. 1. ] mean value: 0.9634271708683473 key: test_recall value: [0.72727273 0.90909091 0.81818182 0.81818182 0.83333333 0.91666667 0.83333333 0.66666667 0.63636364 1. ] mean value: 0.8159090909090909 key: train_recall value: [0.97087379 0.99029126 0.97087379 0.94174757 0.97058824 0.97058824 0.94117647 0.96078431 1. 1. ] mean value: 0.9716923662668951 key: test_roc_auc value: [0.86363636 0.82954545 0.74242424 0.86742424 0.87121212 0.91287879 0.82575758 0.78787879 0.81818182 0.90909091] mean value: 0.8428030303030303 key: train_roc_auc value: [0.96092709 0.98534171 0.96582905 0.93656006 0.9561679 0.9561679 0.94146202 0.97068342 1. 1. ] mean value: 0.9673139158576052 key: test_jcc value: [0.72727273 0.71428571 0.6 0.75 0.76923077 0.84615385 0.71428571 0.61538462 0.63636364 0.84615385] mean value: 0.721913086913087 key: train_jcc value: [0.92592593 0.97142857 0.93457944 0.88181818 0.91666667 0.91666667 0.88888889 0.94230769 1. 1. ] mean value: 0.937828203295493 MCC on Blind test: 0.04 Accuracy on Blind test: 0.52 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01285195 0.00952053 0.00782275 0.00767875 0.00741482 0.0068872 0.00694108 0.00817084 0.00734329 0.00712538] mean value: 0.0081756591796875 key: score_time value: [0.01059294 0.00885248 0.00852776 0.00859118 0.00863194 0.00855207 0.00864601 0.00876474 0.00838137 0.00814605] mean value: 0.008768653869628907 key: test_mcc value: [0.11236664 0.43929769 0.44411739 0.41096386 0.47923384 0.50168817 0.40451992 0.55048188 0.47140452 0.20412415] mean value: 0.4018198054310791 key: train_mcc value: [0.3148712 0.50657911 0.52847427 0.5185658 0.43504485 0.51678072 0.4680327 0.45392287 0.49379046 0.43864549] mean value: 0.4674707470646864 key: test_accuracy value: [0.52173913 0.65217391 0.69565217 0.69565217 0.69565217 0.73913043 0.65217391 0.73913043 0.68181818 0.59090909] mean value: 0.6664031620553359 key: train_accuracy value: [0.60487805 0.72682927 0.73658537 0.72682927 0.68292683 0.73170732 0.70243902 0.69756098 0.7184466 0.68932039] mean value: 0.7017523087852238 key: test_fscore value: [0.64516129 0.73333333 0.74074074 0.72 0.77419355 0.78571429 0.75 0.8 0.75862069 0.66666667] mean value: 0.7374430554819876 key: train_fscore value: [0.71378092 0.77777778 0.78571429 0.78125 0.74903475 0.77911647 0.76078431 0.75590551 0.77165354 0.75193798] mean value: 0.7626955550457906 key: test_precision value: [0.5 0.57894737 0.625 0.64285714 0.63157895 0.6875 0.6 0.66666667 0.61111111 0.5625 ] mean value: 0.6106161236424394 key: train_precision value: [0.56111111 0.65771812 0.66442953 0.65359477 0.61783439 0.65986395 0.63398693 0.63157895 0.64900662 0.62580645] mean value: 0.6354930823444798 key: test_recall value: [0.90909091 1. 0.90909091 0.81818182 1. 0.91666667 1. 1. 1. 0.81818182] mean value: 0.9371212121212121 key: train_recall value: [0.98058252 0.95145631 0.96116505 0.97087379 0.95098039 0.95098039 0.95098039 0.94117647 0.95145631 0.94174757] mean value: 0.9551399200456882 key: test_roc_auc value: [0.53787879 0.66666667 0.70454545 0.70075758 0.68181818 0.73106061 0.63636364 0.72727273 0.68181818 0.59090909] mean value: 0.6659090909090909 key: train_roc_auc value: [0.60303636 0.72572816 0.73548449 0.72563297 0.68422806 0.73277175 0.70364554 0.69874358 0.7184466 0.68932039] mean value: 0.7017037883114411 key: test_jcc value: [0.47619048 0.57894737 0.58823529 0.5625 0.63157895 0.64705882 0.6 0.66666667 0.61111111 0.5 ] mean value: 0.5862288687404786 key: train_jcc value: [0.55494505 0.63636364 0.64705882 0.64102564 0.59876543 0.63815789 0.61392405 0.60759494 0.62820513 0.60248447] mean value: 0.6168525070295942 MCC on Blind test: 0.37 Accuracy on Blind test: 0.65 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0081985 0.00712252 0.00714064 0.00715804 0.00717282 0.00716138 0.00722837 0.00712657 0.00720763 0.00717926] mean value: 0.007269573211669922 key: score_time value: [0.00871158 0.00801897 0.0079174 0.00812721 0.00795984 0.00803661 0.00796866 0.00801921 0.00809813 0.00810313] mean value: 0.008096075057983399 key: test_mcc value: [0.30240737 0.05427825 0.03816905 0.3030303 0.42228828 0.30240737 0.03816905 0.65151515 0.37796447 0.27272727] mean value: 0.2762956564879194 key: train_mcc value: [0.35623111 0.34638101 0.37560698 0.3463735 0.28783552 0.36612372 0.35687769 0.3658258 0.32044877 0.39058328] mean value: 0.3512287378448915 key: test_accuracy value: [0.65217391 0.52173913 0.52173913 0.65217391 0.69565217 0.65217391 0.52173913 0.82608696 0.68181818 0.63636364] mean value: 0.6361660079051383 key: train_accuracy value: [0.67804878 0.67317073 0.68780488 0.67317073 0.64390244 0.68292683 0.67804878 0.68292683 0.66019417 0.69417476] mean value: 0.6754368932038836 key: test_fscore value: [0.6 0.56 0.47619048 0.63636364 0.75862069 0.69230769 0.56 0.83333333 0.72 0.63636364] mean value: 0.6473179464213947 key: train_fscore value: [0.68571429 0.67942584 0.69230769 0.67317073 0.64390244 0.67336683 0.68571429 0.67980296 0.65686275 0.70967742] mean value: 0.6779945226077302 key: test_precision value: [0.66666667 0.5 0.5 0.63636364 0.64705882 0.64285714 0.53846154 0.83333333 0.64285714 0.63636364] mean value: 0.6243961920432509 key: train_precision value: [0.6728972 0.66981132 0.68571429 0.67647059 0.6407767 0.69072165 0.66666667 0.68316832 0.66336634 0.6754386 ] mean value: 0.6725031656102882 key: test_recall value: [0.54545455 0.63636364 0.45454545 0.63636364 0.91666667 0.75 0.58333333 0.83333333 0.81818182 0.63636364] mean value: 0.681060606060606 key: train_recall value: [0.69902913 0.68932039 0.69902913 0.66990291 0.64705882 0.65686275 0.70588235 0.67647059 0.65048544 0.74757282] mean value: 0.6841614315629164 key: test_roc_auc value: [0.64772727 0.52651515 0.51893939 0.65151515 0.68560606 0.64772727 0.51893939 0.82575758 0.68181818 0.63636364] mean value: 0.634090909090909 key: train_roc_auc value: [0.67794594 0.67309157 0.68774986 0.67318675 0.64391776 0.6828003 0.67818389 0.68289549 0.66019417 0.69417476] mean value: 0.6754140491147915 key: test_jcc value: [0.42857143 0.38888889 0.3125 0.46666667 0.61111111 0.52941176 0.38888889 0.71428571 0.5625 0.46666667] mean value: 0.4869491129785247 key: train_jcc value: [0.52173913 0.51449275 0.52941176 0.50735294 0.47482014 0.50757576 0.52173913 0.51492537 0.48905109 0.55 ] mean value: 0.5131108089860595 MCC on Blind test: 0.35 Accuracy on Blind test: 0.67 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00741982 0.00747585 0.00764775 0.00689459 0.00716424 0.00761509 0.00779819 0.00776052 0.00681782 0.00763607] mean value: 0.007422995567321777 key: score_time value: [0.01033688 0.00987959 0.00997186 0.00971317 0.00995612 0.00998878 0.00992155 0.01000547 0.00932527 0.00970054] mean value: 0.00987992286682129 key: test_mcc value: [-0.12878788 0.3030303 0.12878788 0.38932432 0.56490196 0.15096491 0.50460839 0.65909298 0.32539569 0.37796447] mean value: 0.3275283023309783 key: train_mcc value: [0.61013747 0.58290698 0.56242364 0.62329827 0.62174364 0.55771431 0.60061066 0.58363235 0.58722022 0.59402749] mean value: 0.5923715024669853 key: test_accuracy value: [0.43478261 0.65217391 0.56521739 0.69565217 0.7826087 0.56521739 0.69565217 0.82608696 0.63636364 0.68181818] mean value: 0.6535573122529644 key: train_accuracy value: [0.80487805 0.7902439 0.7804878 0.8097561 0.8097561 0.77560976 0.8 0.7902439 0.79126214 0.7961165 ] mean value: 0.7948354250532796 key: test_fscore value: [0.43478261 0.63636364 0.54545455 0.66666667 0.8 0.5 0.58823529 0.84615385 0.5 0.63157895] mean value: 0.6149235544820415 key: train_fscore value: [0.80952381 0.78172589 0.77386935 0.8 0.8 0.75531915 0.79396985 0.77720207 0.77720207 0.78787879] mean value: 0.785669097572126 key: test_precision value: [0.41666667 0.63636364 0.54545455 0.7 0.76923077 0.625 1. 0.78571429 0.8 0.75 ] mean value: 0.7028429903429904 key: train_precision value: [0.79439252 0.81914894 0.80208333 0.84782609 0.83870968 0.8255814 0.81443299 0.82417582 0.83333333 0.82105263] mean value: 0.8220736731371573 key: test_recall value: [0.45454545 0.63636364 0.54545455 0.63636364 0.83333333 0.41666667 0.41666667 0.91666667 0.36363636 0.54545455] mean value: 0.5765151515151515 key: train_recall value: [0.82524272 0.74757282 0.74757282 0.75728155 0.76470588 0.69607843 0.7745098 0.73529412 0.72815534 0.75728155] mean value: 0.7533695031410622 key: test_roc_auc value: [0.43560606 0.65151515 0.56439394 0.69318182 0.78030303 0.5719697 0.70833333 0.8219697 0.63636364 0.68181818] mean value: 0.6545454545454545 key: train_roc_auc value: [0.80477822 0.79045307 0.78064915 0.81001333 0.80953741 0.77522368 0.79987626 0.78997716 0.79126214 0.7961165 ] mean value: 0.7947886921758995 key: test_jcc value: [0.27777778 0.46666667 0.375 0.5 0.66666667 0.33333333 0.41666667 0.73333333 0.33333333 0.46153846] mean value: 0.45643162393162395 key: train_jcc value: [0.68 0.64166667 0.63114754 0.66666667 0.66666667 0.60683761 0.65833333 0.63559322 0.63559322 0.65 ] mean value: 0.6472504921832513 MCC on Blind test: 0.17 Accuracy on Blind test: 0.59 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.00950766 0.0092063 0.0089941 0.00913739 0.00906396 0.00937152 0.00898767 0.0092783 0.00926185 0.00896335] mean value: 0.009177207946777344 key: score_time value: [0.00941825 0.00845361 0.00863886 0.00895977 0.00840282 0.00849152 0.00842237 0.00841475 0.00835538 0.00848985] mean value: 0.008604717254638673 key: test_mcc value: [0.30240737 0.48856385 0.38932432 0.48075018 0.65151515 0.66414149 0.65151515 0.74242424 0.63636364 0.36514837] mean value: 0.5372153759035299 key: train_mcc value: [0.81500527 0.7606076 0.79704499 0.77749321 0.72682277 0.73662669 0.70844205 0.76709739 0.76829494 0.738735 ] mean value: 0.7596169926310354 key: test_accuracy value: [0.65217391 0.73913043 0.69565217 0.73913043 0.82608696 0.82608696 0.82608696 0.86956522 0.81818182 0.68181818] mean value: 0.7673913043478261 key: train_accuracy value: [0.90731707 0.87804878 0.89756098 0.88780488 0.86341463 0.86829268 0.85365854 0.88292683 0.88349515 0.86893204] mean value: 0.8791451574709922 key: test_fscore value: [0.6 0.75 0.66666667 0.7 0.83333333 0.81818182 0.83333333 0.86956522 0.81818182 0.66666667] mean value: 0.7555928853754941 key: train_fscore value: [0.90640394 0.87179487 0.89447236 0.88442211 0.8627451 0.86829268 0.84848485 0.87878788 0.88 0.86567164] mean value: 0.8761075435073197 key: test_precision value: [0.66666667 0.69230769 0.7 0.77777778 0.83333333 0.9 0.83333333 0.90909091 0.81818182 0.7 ] mean value: 0.783069153069153 key: train_precision value: [0.92 0.92391304 0.92708333 0.91666667 0.8627451 0.86407767 0.875 0.90625 0.90721649 0.8877551 ] mean value: 0.8990707408306566 key: test_recall value: [0.54545455 0.81818182 0.63636364 0.63636364 0.83333333 0.75 0.83333333 0.83333333 0.81818182 0.63636364] mean value: 0.7340909090909091 key: train_recall value: [0.89320388 0.82524272 0.86407767 0.85436893 0.8627451 0.87254902 0.82352941 0.85294118 0.85436893 0.84466019] mean value: 0.854768703597944 key: test_roc_auc value: [0.64772727 0.74242424 0.69318182 0.73484848 0.82575758 0.82954545 0.82575758 0.87121212 0.81818182 0.68181818] mean value: 0.7670454545454546 key: train_roc_auc value: [0.90738626 0.87830763 0.89772511 0.88796878 0.86341138 0.86831334 0.85351228 0.88278127 0.88349515 0.86893204] mean value: 0.8791833238149629 key: test_jcc value: [0.42857143 0.6 0.5 0.53846154 0.71428571 0.69230769 0.71428571 0.76923077 0.69230769 0.5 ] mean value: 0.6149450549450549 key: train_jcc value: [0.82882883 0.77272727 0.80909091 0.79279279 0.75862069 0.76724138 0.73684211 0.78378378 0.78571429 0.76315789] mean value: 0.779879994190339 MCC on Blind test: 0.37 Accuracy on Blind test: 0.69 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.7920084 0.75999999 0.51944804 0.73570108 0.81945181 0.88329148 0.80979872 0.6771915 0.85314083 0.79645777] mean value: 0.7646489620208741 key: score_time value: [0.0137279 0.01358175 0.01116061 0.01169181 0.01440668 0.01173091 0.01179957 0.01326561 0.01169682 0.01486826] mean value: 0.01279299259185791 key: test_mcc value: [0.47727273 0.58930667 0.41096386 0.91605722 0.74242424 0.74047959 0.91605722 0.82575758 0.83205029 0.64715023] mean value: 0.7097519634627414 key: train_mcc value: [0.90516294 0.89271776 0.80864195 0.87320324 0.95126594 0.88361919 0.90261781 0.84407425 0.87415728 0.90291262] mean value: 0.8838372984769908 key: test_accuracy value: [0.73913043 0.7826087 0.69565217 0.95652174 0.86956522 0.86956522 0.95652174 0.91304348 0.90909091 0.81818182] mean value: 0.8509881422924901 key: train_accuracy value: [0.95121951 0.94634146 0.90243902 0.93658537 0.97560976 0.94146341 0.95121951 0.92195122 0.9368932 0.95145631] mean value: 0.941517878285579 key: test_fscore value: [0.72727273 0.8 0.72 0.95238095 0.86956522 0.88 0.96 0.91666667 0.91666667 0.8 ] mean value: 0.8542552230378317 key: train_fscore value: [0.95327103 0.9468599 0.90740741 0.93719807 0.97560976 0.94230769 0.95145631 0.9223301 0.93779904 0.95145631] mean value: 0.942569561637334 key: test_precision value: [0.72727273 0.71428571 0.64285714 1. 0.90909091 0.84615385 0.92307692 0.91666667 0.84615385 0.88888889] mean value: 0.8414446664446664 key: train_precision value: [0.91891892 0.94230769 0.86725664 0.93269231 0.97087379 0.9245283 0.94230769 0.91346154 0.9245283 0.95145631] mean value: 0.9288331487717255 key: test_recall value: [0.72727273 0.90909091 0.81818182 0.90909091 0.83333333 0.91666667 1. 0.91666667 1. 0.72727273] mean value: 0.8757575757575757 key: train_recall value: [0.99029126 0.95145631 0.95145631 0.94174757 0.98039216 0.96078431 0.96078431 0.93137255 0.95145631 0.95145631] mean value: 0.9571197411003236 key: test_roc_auc value: [0.73863636 0.78787879 0.70075758 0.95454545 0.87121212 0.86742424 0.95454545 0.91287879 0.90909091 0.81818182] mean value: 0.8515151515151514 key: train_roc_auc value: [0.95102798 0.94631639 0.90219874 0.93656006 0.97563297 0.94155721 0.95126594 0.92199695 0.9368932 0.95145631] mean value: 0.9414905768132495 key: test_jcc value: [0.57142857 0.66666667 0.5625 0.90909091 0.76923077 0.78571429 0.92307692 0.84615385 0.84615385 0.66666667] mean value: 0.7546682484182484 key: train_jcc value: [0.91071429 0.89908257 0.83050847 0.88181818 0.95238095 0.89090909 0.90740741 0.85585586 0.88288288 0.90740741] mean value: 0.8918967107759675 MCC on Blind test: 0.29 Accuracy on Blind test: 0.64 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.0114994 0.01038074 0.00875664 0.00789666 0.00787902 0.00791883 0.008286 0.00795555 0.00806046 0.00805378] mean value: 0.00866870880126953 key: score_time value: [0.0112102 0.00881648 0.00798893 0.00786543 0.00786877 0.00795722 0.00800657 0.00791478 0.00787568 0.00790453] mean value: 0.008340859413146972 key: test_mcc value: [0.74047959 0.41096386 0.74242424 0.91666667 0.58930667 0.83971912 0.58002308 0.91666667 0.83205029 0.91287093] mean value: 0.7481171113402942 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.69565217 0.86956522 0.95652174 0.7826087 0.91304348 0.7826087 0.95652174 0.90909091 0.95454545] mean value: 0.8689723320158103 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.72 0.86956522 0.95652174 0.76190476 0.90909091 0.81481481 0.95652174 0.9 0.95238095] mean value: 0.8697942990986469 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.9 0.64285714 0.83333333 0.91666667 0.88888889 1. 0.73333333 1. 1. 1. ] mean value: 0.8915079365079365 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.81818182 0.90909091 1. 0.66666667 0.83333333 0.91666667 0.91666667 0.81818182 0.90909091] mean value: 0.8606060606060606 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86742424 0.70075758 0.87121212 0.95833333 0.78787879 0.91666667 0.77651515 0.95833333 0.90909091 0.95454545] mean value: 0.8700757575757576 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.5625 0.76923077 0.91666667 0.61538462 0.83333333 0.6875 0.91666667 0.81818182 0.90909091] mean value: 0.7778554778554778 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.03 Accuracy on Blind test: 0.51 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.09773445 0.08627319 0.08551788 0.08611631 0.08678699 0.08618236 0.08573103 0.08577466 0.08596325 0.08555889] mean value: 0.08716390132904053 key: score_time value: [0.01986361 0.01779342 0.01704097 0.01689577 0.01845622 0.01685524 0.01741266 0.01715446 0.0169692 0.01674438] mean value: 0.01751859188079834 key: test_mcc value: [0.74047959 0.76764947 0.56818182 0.82575758 0.82575758 0.91605722 0.65909298 1. 0.83205029 0.83205029] mean value: 0.7967076829215525 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.86956522 0.7826087 0.91304348 0.91304348 0.95652174 0.82608696 1. 0.90909091 0.90909091] mean value: 0.8948616600790513 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.88 0.7826087 0.90909091 0.91666667 0.96 0.84615385 1. 0.91666667 0.9 ] mean value: 0.896832964137312 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.9 0.78571429 0.75 0.90909091 0.91666667 0.92307692 0.78571429 1. 0.84615385 1. ] mean value: 0.8816416916416916 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 1. 0.81818182 0.90909091 0.91666667 1. 0.91666667 1. 1. 0.81818182] mean value: 0.9196969696969697 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86742424 0.875 0.78409091 0.91287879 0.91287879 0.95454545 0.8219697 1. 0.90909091 0.90909091] mean value: 0.8946969696969697 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.78571429 0.64285714 0.83333333 0.84615385 0.92307692 0.73333333 1. 0.84615385 0.81818182] mean value: 0.8178804528804529 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.28 Accuracy on Blind test: 0.62 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00724578 0.00700259 0.00706482 0.00705457 0.00700569 0.00709224 0.00697279 0.00716543 0.00719452 0.00722599] mean value: 0.007102441787719726 key: score_time value: [0.00805378 0.00790071 0.00796628 0.00789952 0.00804639 0.00786138 0.00799799 0.00784135 0.00800538 0.00841856] mean value: 0.007999134063720704 key: test_mcc value: [ 0.48075018 0.47727273 0.47727273 -0.04545455 0.56490196 0.31298622 0.48075018 0.38932432 0.54772256 0.63636364] mean value: 0.4321889948051151 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73913043 0.73913043 0.73913043 0.47826087 0.7826087 0.65217391 0.73913043 0.69565217 0.77272727 0.81818182] mean value: 0.7156126482213438 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.7 0.72727273 0.72727273 0.45454545 0.8 0.63636364 0.76923077 0.72 0.76190476 0.81818182] mean value: 0.7114771894771895 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.77777778 0.72727273 0.72727273 0.45454545 0.76923077 0.7 0.71428571 0.69230769 0.8 0.81818182] mean value: 0.7180874680874682 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.63636364 0.72727273 0.72727273 0.45454545 0.83333333 0.58333333 0.83333333 0.75 0.72727273 0.81818182] mean value: 0.7090909090909091 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73484848 0.73863636 0.73863636 0.47727273 0.78030303 0.65530303 0.73484848 0.69318182 0.77272727 0.81818182] mean value: 0.7143939393939394 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.53846154 0.57142857 0.57142857 0.29411765 0.66666667 0.46666667 0.625 0.5625 0.61538462 0.69230769] mean value: 0.5603961969403146 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: -0.1 Accuracy on Blind test: 0.45 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.09740305 1.13326406 1.09444618 1.15521884 1.09004188 1.09466553 1.09249401 1.09514403 1.09620023 1.09324002] mean value: 1.1042117834091187 key: score_time value: [0.08939648 0.09209704 0.08982658 0.08881688 0.0895195 0.08898306 0.0902555 0.0890646 0.09453082 0.08895087] mean value: 0.09014413356781006 key: test_mcc value: [0.83743579 0.58930667 0.58930667 1. 0.74242424 0.91666667 0.82575758 1. 1. 0.81818182] mean value: 0.8319079425560323 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91304348 0.7826087 0.7826087 1. 0.86956522 0.95652174 0.91304348 1. 1. 0.90909091] mean value: 0.9126482213438735 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.9 0.8 0.8 1. 0.86956522 0.95652174 0.91666667 1. 1. 0.90909091] mean value: 0.9151844532279315 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.71428571 0.71428571 1. 0.90909091 1. 0.91666667 1. 1. 0.90909091] mean value: 0.9163419913419913 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.90909091 0.90909091 1. 0.83333333 0.91666667 0.91666667 1. 1. 0.90909091] mean value: 0.9212121212121211 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90909091 0.78787879 0.78787879 1. 0.87121212 0.95833333 0.91287879 1. 1. 0.90909091] mean value: 0.9136363636363636 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.81818182 0.66666667 0.66666667 1. 0.76923077 0.91666667 0.84615385 1. 1. 0.83333333] mean value: 0.8516899766899767 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.15 Accuracy on Blind test: 0.55 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.8280468 0.83096433 0.89360881 0.91945672 0.93384838 0.88851166 0.8646996 0.91655803 0.83419728 0.87553668] mean value: 0.8785428285598755 key: score_time value: [0.22438312 0.13418078 0.20978713 0.20942521 0.23700547 0.21879888 0.21615219 0.20876241 0.20610762 0.21533132] mean value: 0.20799341201782226 key: test_mcc value: [0.76277007 0.6992059 0.58930667 1. 0.74242424 0.91666667 0.74047959 1. 0.73029674 0.63636364] mean value: 0.7817513515626897 key: train_mcc value: [0.97077583 0.961154 0.98067223 0.961154 0.96116136 0.96116136 0.94219063 0.96097468 0.97091955 0.94245853] mean value: 0.9612622141858389 key: test_accuracy value: [0.86956522 0.82608696 0.7826087 1. 0.86956522 0.95652174 0.86956522 1. 0.86363636 0.81818182] mean value: 0.8855731225296443 key: train_accuracy value: [0.98536585 0.9804878 0.9902439 0.9804878 0.9804878 0.9804878 0.97073171 0.9804878 0.98543689 0.97087379] mean value: 0.9805091167416529 key: test_fscore value: [0.84210526 0.84615385 0.8 1. 0.86956522 0.95652174 0.88 1. 0.86956522 0.81818182] mean value: 0.8882093101406603 key: train_fscore value: [0.98550725 0.98076923 0.99038462 0.98076923 0.98058252 0.98058252 0.97115385 0.98039216 0.98550725 0.97142857] mean value: 0.9807077192665552 key: test_precision value: [1. 0.73333333 0.71428571 1. 0.90909091 1. 0.84615385 1. 0.83333333 0.81818182] mean value: 0.8854378954378954 key: train_precision value: [0.98076923 0.97142857 0.98095238 0.97142857 0.97115385 0.97115385 0.95283019 0.98039216 0.98076923 0.95327103] mean value: 0.9714149051235051 key: test_recall value: [0.72727273 1. 0.90909091 1. 0.83333333 0.91666667 0.91666667 1. 0.90909091 0.81818182] mean value: 0.9030303030303031 key: train_recall value: [0.99029126 0.99029126 1. 0.99029126 0.99019608 0.99019608 0.99019608 0.98039216 0.99029126 0.99029126] mean value: 0.9902436702836475 key: test_roc_auc value: [0.86363636 0.83333333 0.78787879 1. 0.87121212 0.95833333 0.86742424 1. 0.86363636 0.81818182] mean value: 0.8863636363636364 key: train_roc_auc value: [0.98534171 0.98043975 0.99019608 0.98043975 0.98053493 0.98053493 0.97082619 0.98048734 0.98543689 0.97087379] mean value: 0.9805111364934324 key: test_jcc value: [0.72727273 0.73333333 0.66666667 1. 0.76923077 0.91666667 0.78571429 1. 0.76923077 0.69230769] mean value: 0.8060422910422911 key: train_jcc value: [0.97142857 0.96226415 0.98095238 0.96226415 0.96190476 0.96190476 0.94392523 0.96153846 0.97142857 0.94444444] mean value: 0.9622055489133606 MCC on Blind test: 0.26 Accuracy on Blind test: 0.61 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01719785 0.00701189 0.00701404 0.00702286 0.0074749 0.00695348 0.00701451 0.00692368 0.00781441 0.00701284] mean value: 0.008144044876098632 key: score_time value: [0.01571059 0.00787878 0.00795388 0.00786066 0.00811124 0.00786138 0.00785327 0.00791693 0.00871611 0.00790095] mean value: 0.008776378631591798 key: test_mcc value: [0.30240737 0.05427825 0.03816905 0.3030303 0.42228828 0.30240737 0.03816905 0.65151515 0.37796447 0.27272727] mean value: 0.2762956564879194 key: train_mcc value: [0.35623111 0.34638101 0.37560698 0.3463735 0.28783552 0.36612372 0.35687769 0.3658258 0.32044877 0.39058328] mean value: 0.3512287378448915 key: test_accuracy value: [0.65217391 0.52173913 0.52173913 0.65217391 0.69565217 0.65217391 0.52173913 0.82608696 0.68181818 0.63636364] mean value: 0.6361660079051383 key: train_accuracy value: [0.67804878 0.67317073 0.68780488 0.67317073 0.64390244 0.68292683 0.67804878 0.68292683 0.66019417 0.69417476] mean value: 0.6754368932038836 key: test_fscore value: [0.6 0.56 0.47619048 0.63636364 0.75862069 0.69230769 0.56 0.83333333 0.72 0.63636364] mean value: 0.6473179464213947 key: train_fscore value: [0.68571429 0.67942584 0.69230769 0.67317073 0.64390244 0.67336683 0.68571429 0.67980296 0.65686275 0.70967742] mean value: 0.6779945226077302 key: test_precision value: [0.66666667 0.5 0.5 0.63636364 0.64705882 0.64285714 0.53846154 0.83333333 0.64285714 0.63636364] mean value: 0.6243961920432509 key: train_precision value: [0.6728972 0.66981132 0.68571429 0.67647059 0.6407767 0.69072165 0.66666667 0.68316832 0.66336634 0.6754386 ] mean value: 0.6725031656102882 key: test_recall value: [0.54545455 0.63636364 0.45454545 0.63636364 0.91666667 0.75 0.58333333 0.83333333 0.81818182 0.63636364] mean value: 0.681060606060606 key: train_recall value: [0.69902913 0.68932039 0.69902913 0.66990291 0.64705882 0.65686275 0.70588235 0.67647059 0.65048544 0.74757282] mean value: 0.6841614315629164 key: test_roc_auc value: [0.64772727 0.52651515 0.51893939 0.65151515 0.68560606 0.64772727 0.51893939 0.82575758 0.68181818 0.63636364] mean value: 0.634090909090909 key: train_roc_auc value: [0.67794594 0.67309157 0.68774986 0.67318675 0.64391776 0.6828003 0.67818389 0.68289549 0.66019417 0.69417476] mean value: 0.6754140491147915 key: test_jcc value: [0.42857143 0.38888889 0.3125 0.46666667 0.61111111 0.52941176 0.38888889 0.71428571 0.5625 0.46666667] mean value: 0.4869491129785247 key: train_jcc value: [0.52173913 0.51449275 0.52941176 0.50735294 0.47482014 0.50757576 0.52173913 0.51492537 0.48905109 0.55 ] mean value: 0.5131108089860595 MCC on Blind test: 0.35 Accuracy on Blind test: 0.67 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.08559012 0.14609933 0.03756452 0.03884339 0.04227829 0.08023286 0.03739309 0.06376338 0.03956223 0.03932667] mean value: 0.061065387725830075 key: score_time value: [0.01105213 0.01027513 0.01021671 0.00977039 0.00970769 0.01002121 0.00953698 0.00958061 0.00958157 0.00957847] mean value: 0.00993208885192871 key: test_mcc value: [0.83743579 0.58930667 0.66414149 0.91605722 0.74242424 0.91666667 0.91605722 1. 1. 0.81818182] mean value: 0.8400271120875227 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91304348 0.7826087 0.82608696 0.95652174 0.86956522 0.95652174 0.95652174 1. 1. 0.90909091] mean value: 0.9169960474308301 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.9 0.8 0.83333333 0.95238095 0.86956522 0.95652174 0.96 1. 1. 0.90909091] mean value: 0.9180892151326934 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.71428571 0.76923077 1. 0.90909091 1. 0.92307692 1. 1. 0.90909091] mean value: 0.9224775224775225 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.90909091 0.90909091 0.90909091 0.83333333 0.91666667 1. 1. 1. 0.90909091] mean value: 0.9204545454545454 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90909091 0.78787879 0.82954545 0.95454545 0.87121212 0.95833333 0.95454545 1. 1. 0.90909091] mean value: 0.9174242424242425 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.81818182 0.66666667 0.71428571 0.90909091 0.76923077 0.91666667 0.92307692 1. 1. 0.83333333] mean value: 0.8550532800532801 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.07 Accuracy on Blind test: 0.53 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01432657 0.03284144 0.03134632 0.03231716 0.03199744 0.03253031 0.03206563 0.03227401 0.03221512 0.03237677] mean value: 0.0304290771484375 key: score_time value: [0.0105865 0.02112436 0.02062201 0.0215013 0.01899457 0.01899886 0.01989794 0.02078581 0.01071429 0.02165031] mean value: 0.01848759651184082 key: test_mcc value: [0.48075018 0.65151515 0.39393939 1. 0.66414149 0.91666667 0.58002308 0.91666667 0.75592895 0.81818182] mean value: 0.7177813381485896 key: train_mcc value: [0.90310636 0.89271776 0.91224062 0.86358877 0.87320324 0.88292404 0.89271776 0.86341138 0.89358299 0.84481947] mean value: 0.882231240068856 key: test_accuracy value: [0.73913043 0.82608696 0.69565217 1. 0.82608696 0.95652174 0.7826087 0.95652174 0.86363636 0.90909091] mean value: 0.8555335968379447 key: train_accuracy value: [0.95121951 0.94634146 0.95609756 0.93170732 0.93658537 0.94146341 0.94634146 0.93170732 0.94660194 0.9223301 ] mean value: 0.9410395453469098 key: test_fscore value: [0.7 0.81818182 0.69565217 1. 0.81818182 0.95652174 0.81481481 0.95652174 0.84210526 0.90909091] mean value: 0.8511070275601168 key: train_fscore value: [0.95238095 0.9468599 0.95609756 0.93137255 0.93596059 0.94117647 0.94581281 0.93137255 0.94736842 0.92156863] mean value: 0.9409970432884046 key: test_precision value: [0.77777778 0.81818182 0.66666667 1. 0.9 1. 0.73333333 1. 1. 0.90909091] mean value: 0.8805050505050505 key: train_precision value: [0.93457944 0.94230769 0.96078431 0.94059406 0.94059406 0.94117647 0.95049505 0.93137255 0.93396226 0.93069307] mean value: 0.9406558966668068 key: test_recall value: [0.63636364 0.81818182 0.72727273 1. 0.75 0.91666667 0.91666667 0.91666667 0.72727273 0.90909091] mean value: 0.8318181818181818 key: train_recall value: [0.97087379 0.95145631 0.95145631 0.9223301 0.93137255 0.94117647 0.94117647 0.93137255 0.96116505 0.91262136] mean value: 0.9415000951837046 key: test_roc_auc value: [0.73484848 0.82575758 0.6969697 1. 0.82954545 0.95833333 0.77651515 0.95833333 0.86363636 0.90909091] mean value: 0.8553030303030302 key: train_roc_auc value: [0.95112317 0.94631639 0.95612031 0.93175328 0.93656006 0.94146202 0.94631639 0.93170569 0.94660194 0.9223301 ] mean value: 0.9410289358461832 key: test_jcc value: [0.53846154 0.69230769 0.53333333 1. 0.69230769 0.91666667 0.6875 0.91666667 0.72727273 0.83333333] mean value: 0.7537849650349651 key: train_jcc value: [0.90909091 0.89908257 0.91588785 0.87155963 0.87962963 0.88888889 0.89719626 0.87155963 0.9 0.85454545] mean value: 0.8887440829166801 MCC on Blind test: 0.07 Accuracy on Blind test: 0.53 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02144146 0.00733709 0.00703239 0.00698829 0.00709367 0.0077188 0.00785375 0.00805616 0.00777936 0.00778151] mean value: 0.008908247947692871 key: score_time value: [0.00882101 0.00820684 0.00806975 0.00794053 0.00805497 0.00866318 0.00886822 0.00857925 0.00864053 0.00864553] mean value: 0.008448982238769531 key: test_mcc value: [0.30240737 0.05427825 0.21969697 0.3030303 0.39727608 0.30240737 0.39393939 0.56818182 0.29277002 0.09090909] mean value: 0.29248966588600495 key: train_mcc value: [0.37650652 0.40495245 0.36648346 0.34638101 0.41611143 0.36642547 0.29790481 0.32736295 0.38836782 0.33048671] mean value: 0.3620982636826904 key: test_accuracy value: [0.65217391 0.52173913 0.60869565 0.65217391 0.69565217 0.65217391 0.69565217 0.7826087 0.63636364 0.54545455] mean value: 0.6442687747035574 key: train_accuracy value: [0.68780488 0.70243902 0.68292683 0.67317073 0.70731707 0.68292683 0.64878049 0.66341463 0.69417476 0.66504854] mean value: 0.6808003788775752 key: test_fscore value: [0.6 0.56 0.60869565 0.63636364 0.74074074 0.69230769 0.69565217 0.7826087 0.69230769 0.54545455] mean value: 0.6554130828913438 key: train_fscore value: [0.70093458 0.70813397 0.69483568 0.67942584 0.71698113 0.68899522 0.65384615 0.66985646 0.69565217 0.67298578] mean value: 0.6881646985269205 key: test_precision value: [0.66666667 0.5 0.58333333 0.63636364 0.66666667 0.64285714 0.72727273 0.81818182 0.6 0.54545455] mean value: 0.6386796536796537 key: train_precision value: [0.67567568 0.69811321 0.67272727 0.66981132 0.69090909 0.6728972 0.64150943 0.65420561 0.69230769 0.65740741] mean value: 0.6725563905029608 key: test_recall value: [0.54545455 0.63636364 0.63636364 0.63636364 0.83333333 0.75 0.66666667 0.75 0.81818182 0.54545455] mean value: 0.6818181818181818 key: train_recall value: [0.72815534 0.7184466 0.7184466 0.68932039 0.74509804 0.70588235 0.66666667 0.68627451 0.69902913 0.68932039] mean value: 0.7046640015229393 key: test_roc_auc value: [0.64772727 0.52651515 0.60984848 0.65151515 0.68939394 0.64772727 0.6969697 0.78409091 0.63636364 0.54545455] mean value: 0.643560606060606 key: train_roc_auc value: [0.68760708 0.70236056 0.68275271 0.67309157 0.70750048 0.68303826 0.64886731 0.6635256 0.69417476 0.66504854] mean value: 0.6807966876070817 key: test_jcc value: [0.42857143 0.38888889 0.4375 0.46666667 0.58823529 0.52941176 0.53333333 0.64285714 0.52941176 0.375 ] mean value: 0.49198762838468724 key: train_jcc value: [0.53956835 0.54814815 0.5323741 0.51449275 0.55882353 0.52554745 0.48571429 0.50359712 0.53333333 0.50714286] mean value: 0.5248741920974376 MCC on Blind test: 0.39 Accuracy on Blind test: 0.69 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00869298 0.01086235 0.01095939 0.01100326 0.01040626 0.01017833 0.01074862 0.01032758 0.01047254 0.01042032] mean value: 0.010407161712646485 key: score_time value: [0.00884461 0.01061177 0.01056266 0.01120257 0.01041794 0.01044512 0.01040959 0.01039767 0.01039672 0.0103991 ] mean value: 0.010368776321411134 key: test_mcc value: [0.65909298 0.47923384 0.5164589 0.69084928 0.74242424 0.74242424 0.76277007 0.69084928 0.83205029 0.54232614] mean value: 0.6658479277717093 key: train_mcc value: [0.8373082 0.60342152 0.90672005 0.74004127 0.84982541 0.84787319 0.87166073 0.56519801 0.82977382 0.63500064] mean value: 0.7686822837406401 key: test_accuracy value: [0.82608696 0.69565217 0.73913043 0.82608696 0.86956522 0.86956522 0.86956522 0.82608696 0.90909091 0.72727273] mean value: 0.8158102766798419 key: train_accuracy value: [0.91707317 0.77073171 0.95121951 0.85365854 0.92195122 0.92195122 0.93170732 0.74146341 0.90776699 0.79126214] mean value: 0.8708785223774568 key: test_fscore value: [0.8 0.53333333 0.76923077 0.77777778 0.86956522 0.86956522 0.88888889 0.85714286 0.91666667 0.625 ] mean value: 0.7907170727822902 key: train_fscore value: [0.92093023 0.70807453 0.9537037 0.82954545 0.92592593 0.91752577 0.93577982 0.79377432 0.91555556 0.73939394] mean value: 0.8640209254619995 key: test_precision value: [0.88888889 1. 0.66666667 1. 0.90909091 0.90909091 0.8 0.75 0.84615385 1. ] mean value: 0.8769891219891219 key: train_precision value: [0.88392857 0.98275862 0.91150442 1. 0.87719298 0.9673913 0.87931034 0.65806452 0.8442623 0.98387097] mean value: 0.8988284027481475 key: test_recall value: [0.72727273 0.36363636 0.90909091 0.63636364 0.83333333 0.83333333 1. 1. 1. 0.45454545] mean value: 0.7757575757575758 key: train_recall value: [0.96116505 0.55339806 1. 0.70873786 0.98039216 0.87254902 1. 1. 1. 0.59223301] mean value: 0.8668475157053113 key: test_roc_auc value: [0.8219697 0.68181818 0.74621212 0.81818182 0.87121212 0.87121212 0.86363636 0.81818182 0.90909091 0.72727273] mean value: 0.8128787878787879 key: train_roc_auc value: [0.91685703 0.77179707 0.95098039 0.85436893 0.92223491 0.9217114 0.93203883 0.74271845 0.90776699 0.79126214] mean value: 0.8711736150770988 key: test_jcc value: [0.66666667 0.36363636 0.625 0.63636364 0.76923077 0.76923077 0.8 0.75 0.84615385 0.45454545] mean value: 0.6680827505827506 key: train_jcc value: [0.85344828 0.54807692 0.91150442 0.70873786 0.86206897 0.84761905 0.87931034 0.65806452 0.8442623 0.58653846] mean value: 0.769963111850876 MCC on Blind test: 0.26 Accuracy on Blind test: 0.63 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01068044 0.01032329 0.01064396 0.01066589 0.01067901 0.01030278 0.01019835 0.01055288 0.01074505 0.01002216] mean value: 0.0104813814163208 key: score_time value: [0.01096964 0.01078129 0.01076531 0.01084399 0.01041245 0.01038766 0.01039839 0.0104928 0.01040888 0.01040125] mean value: 0.010586166381835937 key: test_mcc value: [0.56490196 0.40451992 0.33371191 0.74047959 0.74242424 0.58002308 0.74242424 0.91666667 1. 0.31622777] mean value: 0.6341379361751176 key: train_mcc value: [0.84083863 0.65525342 0.84965937 0.84102851 0.91330072 0.82620413 0.82825757 0.85400014 0.87581131 0.46017899] mean value: 0.7944532795671018 key: test_accuracy value: [0.7826087 0.65217391 0.65217391 0.86956522 0.86956522 0.7826087 0.86956522 0.95652174 1. 0.59090909] mean value: 0.8025691699604743 key: train_accuracy value: [0.91707317 0.8 0.92195122 0.91707317 0.95609756 0.90731707 0.91219512 0.92682927 0.9368932 0.67475728] mean value: 0.8870187070802747 key: test_fscore value: [0.76190476 0.42857143 0.69230769 0.85714286 0.86956522 0.81481481 0.86956522 0.95652174 1. 0.30769231] mean value: 0.7558086036346906 key: train_fscore value: [0.92237443 0.75151515 0.9266055 0.9119171 0.9569378 0.91402715 0.90721649 0.92537313 0.93896714 0.51798561] mean value: 0.8672919508970722 key: test_precision value: [0.8 1. 0.6 0.9 0.90909091 0.73333333 0.90909091 1. 1. 1. ] mean value: 0.8851515151515151 key: train_precision value: [0.87068966 1. 0.87826087 0.97777778 0.93457944 0.8487395 0.95652174 0.93939394 0.90909091 1. ] mean value: 0.9315053825181348 key: test_recall value: [0.72727273 0.27272727 0.81818182 0.81818182 0.83333333 0.91666667 0.83333333 0.91666667 1. 0.18181818] mean value: 0.7318181818181818 key: train_recall value: [0.98058252 0.60194175 0.98058252 0.85436893 0.98039216 0.99019608 0.8627451 0.91176471 0.97087379 0.34951456] mean value: 0.848296211688559 key: test_roc_auc value: [0.78030303 0.63636364 0.65909091 0.86742424 0.87121212 0.77651515 0.87121212 0.95833333 1. 0.59090909] mean value: 0.8011363636363636 key: train_roc_auc value: [0.91676185 0.80097087 0.92166381 0.91738054 0.9562155 0.9077194 0.91195507 0.92675614 0.9368932 0.67475728] mean value: 0.8871073672187322 key: test_jcc value: [0.61538462 0.27272727 0.52941176 0.75 0.76923077 0.6875 0.76923077 0.91666667 1. 0.18181818] mean value: 0.6491970039764158 key: train_jcc value: [0.8559322 0.60194175 0.86324786 0.83809524 0.91743119 0.84166667 0.83018868 0.86111111 0.88495575 0.34951456] mean value: 0.7844085017308544 MCC on Blind test: 0.21 Accuracy on Blind test: 0.61 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.08768725 0.0763936 0.07663751 0.07690692 0.07599807 0.07673168 0.07730794 0.07686925 0.07642341 0.07606792] mean value: 0.07770235538482666 key: score_time value: [0.01560497 0.01552868 0.0159018 0.01562333 0.01548648 0.01561403 0.01570988 0.0160079 0.01545119 0.0155468 ] mean value: 0.015647506713867186 key: test_mcc value: [0.91605722 0.6992059 0.74242424 0.83743579 0.74242424 0.91666667 0.91605722 0.83971912 1. 1. ] mean value: 0.8609990412070886 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95652174 0.82608696 0.86956522 0.91304348 0.86956522 0.95652174 0.95652174 0.91304348 1. 1. ] mean value: 0.9260869565217391 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95238095 0.84615385 0.86956522 0.9 0.86956522 0.95652174 0.96 0.90909091 1. 1. ] mean value: 0.9263277881538751 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.73333333 0.83333333 1. 0.90909091 1. 0.92307692 1. 1. 1. ] mean value: 0.9398834498834499 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 1. 0.90909091 0.81818182 0.83333333 0.91666667 1. 0.83333333 1. 1. ] mean value: 0.921969696969697 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95454545 0.83333333 0.87121212 0.90909091 0.87121212 0.95833333 0.95454545 0.91666667 1. 1. ] mean value: 0.9268939393939394 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90909091 0.73333333 0.76923077 0.81818182 0.76923077 0.91666667 0.92307692 0.83333333 1. 1. ] mean value: 0.8672144522144523 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: -0.02 Accuracy on Blind test: 0.49 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.03209019 0.02829218 0.03167534 0.03128099 0.03293514 0.02934527 0.03381133 0.02705407 0.03058004 0.02823019] mean value: 0.030529475212097167 key: score_time value: [0.01726365 0.02387595 0.02088284 0.02278829 0.02153826 0.01731133 0.01608276 0.01754308 0.02704978 0.01536942] mean value: 0.01997053623199463 key: test_mcc value: [0.83743579 0.39393939 0.66414149 1. 0.74242424 0.91666667 0.91605722 1. 0.91287093 0.81818182] mean value: 0.820171755257551 key: train_mcc value: [0.98048734 0.99029034 0.99029034 0.99029126 0.98067223 0.98048734 0.99029034 0.99029034 0.96189066 0.99033794] mean value: 0.9845328141111404 key: test_accuracy value: [0.91304348 0.69565217 0.82608696 1. 0.86956522 0.95652174 0.95652174 1. 0.95454545 0.90909091] mean value: 0.908102766798419 key: train_accuracy value: [0.9902439 0.99512195 0.99512195 0.99512195 0.9902439 0.9902439 0.99512195 0.99512195 0.98058252 0.99514563] mean value: 0.992206961875444 key: test_fscore value: [0.9 0.69565217 0.83333333 1. 0.86956522 0.95652174 0.96 1. 0.95238095 0.90909091] mean value: 0.9076544325239977 key: train_fscore value: [0.99029126 0.99516908 0.99516908 0.99512195 0.99009901 0.99019608 0.99507389 0.99507389 0.98019802 0.99512195] mean value: 0.9921514220211729 key: test_precision value: [1. 0.66666667 0.76923077 1. 0.90909091 1. 0.92307692 1. 1. 0.90909091] mean value: 0.9177156177156177 key: train_precision value: [0.99029126 0.99038462 0.99038462 1. 1. 0.99019608 1. 1. 1. 1. ] mean value: 0.9961256571336525 key: test_recall value: [0.81818182 0.72727273 0.90909091 1. 0.83333333 0.91666667 1. 1. 0.90909091 0.90909091] mean value: 0.9022727272727272 key: train_recall value: [0.99029126 1. 1. 0.99029126 0.98039216 0.99019608 0.99019608 0.99019608 0.96116505 0.99029126] mean value: 0.988301922710832 key: test_roc_auc value: [0.90909091 0.6969697 0.82954545 1. 0.87121212 0.95833333 0.95454545 1. 0.95454545 0.90909091] mean value: 0.9083333333333333 key: train_roc_auc value: [0.99024367 0.99509804 0.99509804 0.99514563 0.99019608 0.99024367 0.99509804 0.99509804 0.98058252 0.99514563] mean value: 0.992194936226918 key: test_jcc value: [0.81818182 0.53333333 0.71428571 1. 0.76923077 0.91666667 0.92307692 1. 0.90909091 0.83333333] mean value: 0.8417199467199468 key: train_jcc value: [0.98076923 0.99038462 0.99038462 0.99029126 0.98039216 0.98058252 0.99019608 0.99019608 0.96116505 0.99029126] mean value: 0.984465287235133 MCC on Blind test: 0.07 Accuracy on Blind test: 0.53 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.0418961 0.05101323 0.05104518 0.05136704 0.05105376 0.05128527 0.05111003 0.04877734 0.05096364 0.04899049] mean value: 0.04975020885467529 key: score_time value: [0.02236867 0.01667118 0.022753 0.02080393 0.02091503 0.02079797 0.01660824 0.01153994 0.01939178 0.01141834] mean value: 0.018326807022094726 key: test_mcc value: [0.12336594 0.39393939 0.56490196 0.39727608 0.74047959 0.41096386 0.58930667 0.65151515 0.48795004 0.2773501 ] mean value: 0.46370487732579513 key: train_mcc value: [0.95126131 0.92355447 0.90401389 0.93283198 0.93209539 0.92211753 0.91257158 0.91325992 0.90308289 0.89358299] mean value: 0.9188371932639088 key: test_accuracy value: [0.56521739 0.69565217 0.7826087 0.69565217 0.86956522 0.69565217 0.7826087 0.82608696 0.72727273 0.63636364] mean value: 0.7276679841897233 key: train_accuracy value: [0.97560976 0.96097561 0.95121951 0.96585366 0.96585366 0.96097561 0.95609756 0.95609756 0.95145631 0.94660194] mean value: 0.9590741179256452 key: test_fscore value: [0.44444444 0.69565217 0.76190476 0.63157895 0.88 0.66666667 0.76190476 0.83333333 0.66666667 0.6 ] mean value: 0.69421517562021 key: train_fscore value: [0.97584541 0.96 0.95 0.96517413 0.96517413 0.96039604 0.95522388 0.95477387 0.95098039 0.94581281] mean value: 0.9583380658920831 key: test_precision value: [0.57142857 0.66666667 0.8 0.75 0.84615385 0.77777778 0.88888889 0.83333333 0.85714286 0.66666667] mean value: 0.7658058608058608 key: train_precision value: [0.97115385 0.98969072 0.97938144 0.98979592 0.97979798 0.97 0.96969697 0.97938144 0.96039604 0.96 ] mean value: 0.9749294361867525 key: test_recall value: [0.36363636 0.72727273 0.72727273 0.54545455 0.91666667 0.58333333 0.66666667 0.83333333 0.54545455 0.54545455] mean value: 0.6454545454545455 key: train_recall value: [0.98058252 0.93203883 0.9223301 0.94174757 0.95098039 0.95098039 0.94117647 0.93137255 0.94174757 0.93203883] mean value: 0.9424995240814773 key: test_roc_auc value: [0.55681818 0.6969697 0.78030303 0.68939394 0.86742424 0.70075758 0.78787879 0.82575758 0.72727273 0.63636364] mean value: 0.7268939393939393 key: train_roc_auc value: [0.97558538 0.96111746 0.95136113 0.96597183 0.96578146 0.96092709 0.95602513 0.95597754 0.95145631 0.94660194] mean value: 0.9590805254140491 key: test_jcc value: [0.28571429 0.53333333 0.61538462 0.46153846 0.78571429 0.5 0.61538462 0.71428571 0.5 0.42857143] mean value: 0.543992673992674 key: train_jcc value: [0.95283019 0.92307692 0.9047619 0.93269231 0.93269231 0.92380952 0.91428571 0.91346154 0.90654206 0.89719626] mean value: 0.9201348726216474 MCC on Blind test: 0.32 Accuracy on Blind test: 0.66 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.15189385 0.14156413 0.14368677 0.14271593 0.13952732 0.14208126 0.13928676 0.1380887 0.14191723 0.14071012] mean value: 0.14214720726013183 key: score_time value: [0.00953889 0.00933433 0.0094347 0.00952435 0.00953817 0.00918531 0.00900173 0.00848866 0.00957823 0.00928164] mean value: 0.009290599822998047 key: test_mcc value: [0.76277007 0.5164589 0.66414149 0.83743579 0.74242424 0.91666667 0.91605722 0.91666667 1. 0.81818182] mean value: 0.8090802870032009 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.73913043 0.82608696 0.91304348 0.86956522 0.95652174 0.95652174 0.95652174 1. 0.90909091] mean value: 0.899604743083004 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.84210526 0.76923077 0.83333333 0.9 0.86956522 0.95652174 0.96 0.95652174 1. 0.90909091] mean value: 0.8996368970465081 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.66666667 0.76923077 1. 0.90909091 1. 0.92307692 1. 1. 0.90909091] mean value: 0.9177156177156177 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.90909091 0.90909091 0.81818182 0.83333333 0.91666667 1. 0.91666667 1. 0.90909091] mean value: 0.8939393939393939 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86363636 0.74621212 0.82954545 0.90909091 0.87121212 0.95833333 0.95454545 0.95833333 1. 0.90909091] mean value: 0.9 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.72727273 0.625 0.71428571 0.81818182 0.76923077 0.91666667 0.92307692 0.91666667 1. 0.83333333] mean value: 0.8243714618714619 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.54 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.01663566 0.01202774 0.01275182 0.01185322 0.01200604 0.01168084 0.01452398 0.0116775 0.01183033 0.01218224] mean value: 0.012716937065124511 key: score_time value: [0.01137972 0.01086974 0.01099038 0.01081109 0.01094103 0.01092815 0.01124692 0.01074839 0.01087403 0.01108217] mean value: 0.010987162590026855 key: test_mcc value: [0.15096491 0.56879646 0.29359034 0.33371191 0.55048188 0.65909298 0.40451992 0.55048188 0.56694671 0.54232614] mean value: 0.4620913131591764 key: train_mcc value: [0.58647158 0.65859127 0.56715421 0.63490794 0.52720108 0.49387839 0.4975669 0.55024014 0.59539971 0.63353022] mean value: 0.5744941432717103 key: test_accuracy value: [0.56521739 0.73913043 0.60869565 0.65217391 0.73913043 0.82608696 0.65217391 0.73913043 0.77272727 0.72727273] mean value: 0.7021739130434782 key: train_accuracy value: [0.76097561 0.8097561 0.76585366 0.79512195 0.72195122 0.73170732 0.69756098 0.73170732 0.77669903 0.78640777] mean value: 0.7577740942457968 key: test_fscore value: [0.61538462 0.78571429 0.68965517 0.69230769 0.8 0.84615385 0.75 0.8 0.8 0.78571429] mean value: 0.7564929897688518 key: train_fscore value: [0.80632411 0.83817427 0.80165289 0.82786885 0.77992278 0.76987448 0.76691729 0.78764479 0.81147541 0.824 ] mean value: 0.8013854877176021 key: test_precision value: [0.53333333 0.64705882 0.55555556 0.6 0.66666667 0.78571429 0.6 0.66666667 0.71428571 0.64705882] mean value: 0.6416339869281046 key: train_precision value: [0.68 0.73188406 0.69784173 0.71631206 0.6433121 0.67153285 0.62195122 0.64968153 0.70212766 0.70068027] mean value: 0.6815323469811392 key: test_recall value: [0.72727273 1. 0.90909091 0.81818182 1. 0.91666667 1. 1. 0.90909091 1. ] mean value: 0.928030303030303 key: train_recall value: [0.99029126 0.98058252 0.94174757 0.98058252 0.99019608 0.90196078 1. 1. 0.96116505 1. ] mean value: 0.9746525794783933 key: test_roc_auc value: [0.5719697 0.75 0.62121212 0.65909091 0.72727273 0.8219697 0.63636364 0.72727273 0.77272727 0.72727273] mean value: 0.7015151515151515 key: train_roc_auc value: [0.75985151 0.80891871 0.76499143 0.79421283 0.72325338 0.73253379 0.69902913 0.73300971 0.77669903 0.78640777] mean value: 0.7578907291071768 key: test_jcc value: [0.44444444 0.64705882 0.52631579 0.52941176 0.66666667 0.73333333 0.6 0.66666667 0.66666667 0.64705882] mean value: 0.6127622979016167 key: train_jcc value: [0.67549669 0.72142857 0.66896552 0.70629371 0.63924051 0.62585034 0.62195122 0.64968153 0.68275862 0.70068027] mean value: 0.6692346971143661 MCC on Blind test: 0.36 Accuracy on Blind test: 0.61 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.01373434 0.01035643 0.01028872 0.01046038 0.01034403 0.01036429 0.01039052 0.01046538 0.01036501 0.01036739] mean value: 0.010713648796081544 key: score_time value: [0.01092839 0.01048851 0.01048732 0.01047397 0.01050997 0.01047373 0.01039219 0.0105195 0.01046062 0.01039433] mean value: 0.010512852668762207 key: test_mcc value: [0.62050523 0.74242424 0.47727273 0.91605722 0.66414149 0.82575758 0.82575758 0.83971912 0.91287093 0.73029674] mean value: 0.7554802857310763 key: train_mcc value: [0.87320324 0.85404174 0.8742382 0.82504775 0.87320324 0.83447633 0.86356283 0.85368872 0.86424061 0.86424061] mean value: 0.8579943273085286 key: test_accuracy value: [0.7826087 0.86956522 0.73913043 0.95652174 0.82608696 0.91304348 0.91304348 0.91304348 0.95454545 0.86363636] mean value: 0.8731225296442687 key: train_accuracy value: [0.93658537 0.92682927 0.93658537 0.91219512 0.93658537 0.91707317 0.93170732 0.92682927 0.93203883 0.93203883] mean value: 0.9288467913805352 key: test_fscore value: [0.70588235 0.86956522 0.72727273 0.95238095 0.81818182 0.91666667 0.91666667 0.90909091 0.95238095 0.85714286] mean value: 0.862523112011603 key: train_fscore value: [0.93719807 0.92610837 0.93532338 0.91089109 0.93596059 0.91542289 0.93069307 0.92610837 0.93137255 0.93137255] mean value: 0.9280450932646102 key: test_precision value: [1. 0.83333333 0.72727273 1. 0.9 0.91666667 0.91666667 1. 1. 0.9 ] mean value: 0.9193939393939394 key: train_precision value: [0.93269231 0.94 0.95918367 0.92929293 0.94059406 0.92929293 0.94 0.93069307 0.94059406 0.94059406] mean value: 0.9382937087272306 key: test_recall value: [0.54545455 0.90909091 0.72727273 0.90909091 0.75 0.91666667 0.91666667 0.83333333 0.90909091 0.81818182] mean value: 0.8234848484848485 key: train_recall value: [0.94174757 0.91262136 0.91262136 0.89320388 0.93137255 0.90196078 0.92156863 0.92156863 0.9223301 0.9223301 ] mean value: 0.9181324957167333 key: test_roc_auc value: [0.77272727 0.87121212 0.73863636 0.95454545 0.82954545 0.91287879 0.91287879 0.91666667 0.95454545 0.86363636] mean value: 0.8727272727272727 key: train_roc_auc value: [0.93656006 0.92689891 0.93670284 0.91228822 0.93656006 0.91699981 0.9316581 0.92680373 0.93203883 0.93203883] mean value: 0.9288549400342662 key: test_jcc value: [0.54545455 0.76923077 0.57142857 0.90909091 0.69230769 0.84615385 0.84615385 0.83333333 0.90909091 0.75 ] mean value: 0.7672244422244422 key: train_jcc value: [0.88181818 0.86238532 0.87850467 0.83636364 0.87962963 0.8440367 0.87037037 0.86238532 0.87155963 0.87155963] mean value: 0.8658613096583602 MCC on Blind test: 0.21 Accuracy on Blind test: 0.6 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:143: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:146: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.08771014 0.08218527 0.08186293 0.08213329 0.14164805 0.1171093 0.08225846 0.0825932 0.08844352 0.08227658] mean value: 0.09282207489013672 key: score_time value: [0.01075149 0.01060557 0.01062846 0.01066661 0.01069403 0.01066709 0.01072025 0.01071143 0.01065779 0.01065183] mean value: 0.010675454139709472 key: test_mcc value: [0.69084928 0.74242424 0.39393939 1. 0.66414149 0.91666667 0.65909298 0.91666667 0.91287093 0.91287093] mean value: 0.7809522578121664 key: train_mcc value: [0.87321531 0.85404174 0.90261781 0.85404174 0.87320324 0.88292404 0.88308106 0.86341138 0.87382759 0.8544092 ] mean value: 0.8714773106645599 key: test_accuracy value: [0.82608696 0.86956522 0.69565217 1. 0.82608696 0.95652174 0.82608696 0.95652174 0.95454545 0.95454545] mean value: 0.8865612648221344 key: train_accuracy value: [0.93658537 0.92682927 0.95121951 0.92682927 0.93658537 0.94146341 0.94146341 0.93170732 0.9368932 0.92718447] mean value: 0.9356760596732181 key: test_fscore value: [0.77777778 0.86956522 0.69565217 1. 0.81818182 0.95652174 0.84615385 0.95652174 0.95238095 0.95652174] mean value: 0.8829277003190047 key: train_fscore value: [0.93658537 0.92610837 0.95098039 0.92610837 0.93596059 0.94117647 0.94059406 0.93137255 0.93658537 0.92682927] mean value: 0.9352300811072124 key: test_precision value: [1. 0.83333333 0.66666667 1. 0.9 1. 0.78571429 1. 1. 0.91666667] mean value: 0.9102380952380953 key: train_precision value: [0.94117647 0.94 0.96039604 0.94 0.94059406 0.94117647 0.95 0.93137255 0.94117647 0.93137255] mean value: 0.9417264608813822 key: test_recall value: [0.63636364 0.90909091 0.72727273 1. 0.75 0.91666667 0.91666667 0.91666667 0.90909091 1. ] mean value: 0.8681818181818182 key: train_recall value: [0.93203883 0.91262136 0.94174757 0.91262136 0.93137255 0.94117647 0.93137255 0.93137255 0.93203883 0.9223301 ] mean value: 0.9288692175899487 key: test_roc_auc value: [0.81818182 0.87121212 0.6969697 1. 0.82954545 0.95833333 0.8219697 0.95833333 0.95454545 0.95454545] mean value: 0.8863636363636364 key: train_roc_auc value: [0.93660765 0.92689891 0.95126594 0.92689891 0.93656006 0.94146202 0.94141443 0.93170569 0.9368932 0.92718447] mean value: 0.9356891300209405 key: test_jcc value: [0.63636364 0.76923077 0.53333333 1. 0.69230769 0.91666667 0.73333333 0.91666667 0.90909091 0.91666667] mean value: 0.8023659673659673 key: train_jcc value: [0.88073394 0.86238532 0.90654206 0.86238532 0.87962963 0.88888889 0.88785047 0.87155963 0.88073394 0.86363636] mean value: 0.8784345570656983 MCC on Blind test: 0.09 Accuracy on Blind test: 0.54 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.01983237 0.0227282 0.0234642 0.02106881 0.01955771 0.02060318 0.02057695 0.02065086 0.02190328 0.02053022] mean value: 0.021091580390930176 key: score_time value: [0.01070762 0.01104808 0.01071906 0.01067495 0.01204062 0.0107677 0.01073122 0.01068878 0.01067734 0.01063418] mean value: 0.0108689546585083 key: test_mcc value: [0.58002308 0.48856385 0.23262105 0.65909298 0.65909298 0.83971912 0.91605722 0.82575758 1. 0.27272727] mean value: 0.6473655141770323 key: train_mcc value: [0.78548989 0.77565201 0.83417421 0.74754561 0.77565201 0.75613935 0.76601619 0.77565201 0.77673564 0.78640777] mean value: 0.7779464673314072 key: test_accuracy value: [0.7826087 0.73913043 0.60869565 0.82608696 0.82608696 0.91304348 0.95652174 0.91304348 1. 0.63636364] mean value: 0.8201581027667985 key: train_accuracy value: [0.89268293 0.88780488 0.91707317 0.87317073 0.88780488 0.87804878 0.88292683 0.88780488 0.88834951 0.89320388] mean value: 0.8888870471228985 key: test_fscore value: [0.73684211 0.75 0.64 0.8 0.84615385 0.90909091 0.96 0.91666667 1. 0.63636364] mean value: 0.8195117163538216 key: train_fscore value: [0.89423077 0.88780488 0.9178744 0.87735849 0.88780488 0.87804878 0.88349515 0.88780488 0.88888889 0.89320388] mean value: 0.8896514988581322 key: test_precision value: [0.875 0.69230769 0.57142857 0.88888889 0.78571429 1. 0.92307692 0.91666667 1. 0.63636364] mean value: 0.8289446664446665 key: train_precision value: [0.88571429 0.89215686 0.91346154 0.85321101 0.88349515 0.87378641 0.875 0.88349515 0.88461538 0.89320388] mean value: 0.8838139663234891 key: test_recall value: [0.63636364 0.81818182 0.72727273 0.72727273 0.91666667 0.83333333 1. 0.91666667 1. 0.63636364] mean value: 0.8212121212121212 key: train_recall value: [0.90291262 0.88349515 0.9223301 0.90291262 0.89215686 0.88235294 0.89215686 0.89215686 0.89320388 0.89320388] mean value: 0.895688178183895 key: test_roc_auc value: [0.77651515 0.74242424 0.61363636 0.8219697 0.8219697 0.91666667 0.95454545 0.91287879 1. 0.63636364] mean value: 0.8196969696969697 key: train_roc_auc value: [0.89263278 0.887826 0.9170474 0.87302494 0.887826 0.87806967 0.88297164 0.887826 0.88834951 0.89320388] mean value: 0.8888777841233582 key: test_jcc value: [0.58333333 0.6 0.47058824 0.66666667 0.73333333 0.83333333 0.92307692 0.84615385 1. 0.46666667] mean value: 0.7123152337858221 key: train_jcc value: [0.80869565 0.79824561 0.84821429 0.78151261 0.79824561 0.7826087 0.79130435 0.79824561 0.8 0.80701754] mean value: 0.8014089972373388 MCC on Blind test: 0.35 Accuracy on Blind test: 0.67 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.63725948 0.8160131 0.64619517 0.67831397 0.81363416 0.64932156 0.64309788 0.77185845 0.69442797 0.65856528] mean value: 0.7008687019348144 key: score_time value: [0.01391268 0.01416373 0.01514912 0.01428485 0.01425004 0.01425481 0.01451612 0.01446199 0.01425672 0.01421475] mean value: 0.014346480369567871 key: test_mcc value: [0.58002308 0.74242424 0.58930667 0.74047959 0.74242424 0.82575758 0.74242424 0.58930667 0.75592895 0.75592895] mean value: 0.7064004193432867 key: train_mcc value: [1. 0.99029034 0.93174679 1. 0.96116136 0.93174679 0.87355997 0.92194936 0.99033794 0.97091955] mean value: 0.9571712098991773 key: test_accuracy value: [0.7826087 0.86956522 0.7826087 0.86956522 0.86956522 0.91304348 0.86956522 0.7826087 0.86363636 0.86363636] mean value: 0.8466403162055336 key: train_accuracy value: [1. 0.99512195 0.96585366 1. 0.9804878 0.96585366 0.93658537 0.96097561 0.99514563 0.98543689] mean value: 0.9785460573052333 key: test_fscore value: [0.73684211 0.86956522 0.8 0.85714286 0.86956522 0.91666667 0.86956522 0.76190476 0.84210526 0.88 ] mean value: 0.8403357306309251 key: train_fscore value: [1. 0.99516908 0.96618357 1. 0.98058252 0.96551724 0.93719807 0.96078431 0.99512195 0.98550725] mean value: 0.978606400161065 key: test_precision value: [0.875 0.83333333 0.71428571 0.9 0.90909091 0.91666667 0.90909091 0.88888889 1. 0.78571429] mean value: 0.8732070707070707 key: train_precision value: [1. 0.99038462 0.96153846 1. 0.97115385 0.97029703 0.92380952 0.96078431 1. 0.98076923] mean value: 0.9758737021084138 key: test_recall value: [0.63636364 0.90909091 0.90909091 0.81818182 0.83333333 0.91666667 0.83333333 0.66666667 0.72727273 1. ] mean value: 0.825 key: train_recall value: [1. 1. 0.97087379 1. 0.99019608 0.96078431 0.95098039 0.96078431 0.99029126 0.99029126] mean value: 0.9814201408718828 key: test_roc_auc value: [0.77651515 0.87121212 0.78787879 0.86742424 0.87121212 0.91287879 0.87121212 0.78787879 0.86363636 0.86363636] mean value: 0.8473484848484848 key: train_roc_auc value: [1. 0.99509804 0.96582905 1. 0.98053493 0.96582905 0.93665524 0.96097468 0.99514563 0.98543689] mean value: 0.9785503521797069 key: test_jcc value: [0.58333333 0.76923077 0.66666667 0.75 0.76923077 0.84615385 0.76923077 0.61538462 0.72727273 0.78571429] mean value: 0.7282217782217782 key: train_jcc value: [1. 0.99038462 0.93457944 1. 0.96190476 0.93333333 0.88181818 0.9245283 0.99029126 0.97142857] mean value: 0.9588268467144515 MCC on Blind test: 0.08 Accuracy on Blind test: 0.54 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.00974512 0.00931644 0.00723195 0.00713229 0.00715876 0.00732064 0.00752759 0.00799894 0.00718284 0.00744987] mean value: 0.00780644416809082 key: score_time value: [0.01071715 0.00873137 0.00839925 0.0081389 0.008322 0.00864697 0.00856805 0.0087285 0.00848913 0.00818849] mean value: 0.00869297981262207 key: test_mcc value: [0.2096648 0.56879646 0.29359034 0.31298622 0.32232919 0.65151515 0.40451992 0.01343038 0.54232614 0.29277002] mean value: 0.36119286228684955 key: train_mcc value: [0.42185455 0.44881052 0.49019032 0.45523737 0.44991626 0.44991626 0.4598332 0.51034181 0.41615085 0.43864549] mean value: 0.45408966367968817 key: test_accuracy value: [0.56521739 0.73913043 0.60869565 0.65217391 0.60869565 0.82608696 0.65217391 0.52173913 0.72727273 0.63636364] mean value: 0.6537549407114625 key: train_accuracy value: [0.65853659 0.69268293 0.71707317 0.69268293 0.68780488 0.68780488 0.69756098 0.73658537 0.67961165 0.68932039] mean value: 0.6939663746152025 key: test_fscore value: [0.66666667 0.78571429 0.68965517 0.66666667 0.72727273 0.83333333 0.75 0.66666667 0.78571429 0.69230769] mean value: 0.7263997496756118 key: train_fscore value: [0.74452555 0.75675676 0.77165354 0.75862069 0.75384615 0.75384615 0.7578125 0.7768595 0.74418605 0.75193798] mean value: 0.7570044879996563 key: test_precision value: [0.52631579 0.64705882 0.55555556 0.61538462 0.57142857 0.83333333 0.6 0.52380952 0.64705882 0.6 ] mean value: 0.6119945036044108 key: train_precision value: [0.59649123 0.62820513 0.64900662 0.62658228 0.62025316 0.62025316 0.62987013 0.67142857 0.61935484 0.62580645] mean value: 0.6287251578008078 key: test_recall value: [0.90909091 1. 0.90909091 0.72727273 1. 0.83333333 1. 0.91666667 1. 0.81818182] mean value: 0.9113636363636364 key: train_recall value: [0.99029126 0.95145631 0.95145631 0.96116505 0.96078431 0.96078431 0.95098039 0.92156863 0.93203883 0.94174757] mean value: 0.9522272986864648 key: test_roc_auc value: [0.57954545 0.75 0.62121212 0.65530303 0.59090909 0.82575758 0.63636364 0.50378788 0.72727273 0.63636364] mean value: 0.6526515151515152 key: train_roc_auc value: [0.65691034 0.69141443 0.71592423 0.69136684 0.68913002 0.68913002 0.69879117 0.73748334 0.67961165 0.68932039] mean value: 0.693908242908814 key: test_jcc value: [0.5 0.64705882 0.52631579 0.5 0.57142857 0.71428571 0.6 0.5 0.64705882 0.52941176] mean value: 0.5735559486952676 key: train_jcc value: [0.59302326 0.60869565 0.62820513 0.61111111 0.60493827 0.60493827 0.61006289 0.63513514 0.59259259 0.60248447] mean value: 0.609118678337316 MCC on Blind test: 0.48 Accuracy on Blind test: 0.71 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0070138 0.00690508 0.00694633 0.00693202 0.00696373 0.00683618 0.00693846 0.0069778 0.0068872 0.00699329] mean value: 0.006939387321472168 key: score_time value: [0.00785279 0.00785279 0.00780988 0.00785208 0.0078671 0.00781655 0.00785685 0.00782061 0.0078702 0.00784469] mean value: 0.007844352722167968 key: test_mcc value: [ 0.39393939 0.06579517 -0.03816905 0.38932432 0.33946383 0.56490196 0.33946383 0.21452908 0.54772256 0.36514837] mean value: 0.318211945518085 key: train_mcc value: [0.39749865 0.36390677 0.37171873 0.369368 0.40852696 0.36225341 0.37286188 0.38354703 0.38043802 0.34401398] mean value: 0.37541334377025193 key: test_accuracy value: [0.69565217 0.52173913 0.47826087 0.69565217 0.65217391 0.7826087 0.65217391 0.60869565 0.77272727 0.68181818] mean value: 0.6541501976284585 key: train_accuracy value: [0.69756098 0.67804878 0.68292683 0.68292683 0.70243902 0.67804878 0.68292683 0.68780488 0.68932039 0.66504854] mean value: 0.6847051858868103 key: test_fscore value: [0.69565217 0.59259259 0.5 0.66666667 0.73333333 0.8 0.73333333 0.66666667 0.7826087 0.66666667] mean value: 0.6837520128824477 key: train_fscore value: [0.71559633 0.71052632 0.71111111 0.70588235 0.71889401 0.7027027 0.70852018 0.71428571 0.7037037 0.70638298] mean value: 0.7097605398121303 key: test_precision value: [0.66666667 0.5 0.46153846 0.7 0.61111111 0.76923077 0.61111111 0.6 0.75 0.7 ] mean value: 0.6369658119658119 key: train_precision value: [0.67826087 0.648 0.6557377 0.66101695 0.67826087 0.65 0.65289256 0.6557377 0.67256637 0.62878788] mean value: 0.6581260910571809 key: test_recall value: [0.72727273 0.72727273 0.54545455 0.63636364 0.91666667 0.83333333 0.91666667 0.75 0.81818182 0.63636364] mean value: 0.7507575757575757 key: train_recall value: [0.75728155 0.78640777 0.77669903 0.75728155 0.76470588 0.76470588 0.7745098 0.78431373 0.73786408 0.80582524] mean value: 0.7709594517418618 key: test_roc_auc value: [0.6969697 0.53030303 0.48106061 0.69318182 0.64015152 0.78030303 0.64015152 0.60227273 0.77272727 0.68181818] mean value: 0.6518939393939394 key: train_roc_auc value: [0.69726823 0.67751761 0.68246716 0.68256235 0.70274129 0.67846945 0.68337141 0.68827337 0.68932039 0.66504854] mean value: 0.6847039786788501 key: test_jcc value: [0.53333333 0.42105263 0.33333333 0.5 0.57894737 0.66666667 0.57894737 0.5 0.64285714 0.5 ] mean value: 0.5255137844611529 key: train_jcc value: [0.55714286 0.55102041 0.55172414 0.54545455 0.56115108 0.54166667 0.54861111 0.55555556 0.54285714 0.54605263] mean value: 0.5501236135597817 MCC on Blind test: 0.47 Accuracy on Blind test: 0.73 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00674987 0.00651455 0.00655532 0.00653768 0.0065999 0.007195 0.00682569 0.00735092 0.00722623 0.0072701 ] mean value: 0.006882524490356446 key: score_time value: [0.01375031 0.0088625 0.00889778 0.01025677 0.00893879 0.00916982 0.00883722 0.00970483 0.00950575 0.00965738] mean value: 0.0097581148147583 key: test_mcc value: [0.21452908 0.39393939 0.33371191 0.48075018 0.39727608 0.12878788 0.25495628 0.30240737 0.48795004 0.18898224] mean value: 0.31832904389236555 key: train_mcc value: [0.63902904 0.59060621 0.60982579 0.63382493 0.67133261 0.60982579 0.68889027 0.67805807 0.59504408 0.6617241 ] mean value: 0.6378160892933147 key: test_accuracy value: [0.60869565 0.69565217 0.65217391 0.73913043 0.69565217 0.56521739 0.60869565 0.65217391 0.72727273 0.59090909] mean value: 0.6535573122529644 key: train_accuracy value: [0.8195122 0.79512195 0.80487805 0.81463415 0.83414634 0.80487805 0.84390244 0.83902439 0.7961165 0.83009709] mean value: 0.8182311153208619 key: test_fscore value: [0.52631579 0.69565217 0.69230769 0.7 0.74074074 0.58333333 0.52631579 0.69230769 0.66666667 0.52631579] mean value: 0.6349955667690221 key: train_fscore value: [0.82125604 0.8 0.80769231 0.80412371 0.82474227 0.8019802 0.83838384 0.83743842 0.78571429 0.83568075] mean value: 0.8157011822658049 key: test_precision value: [0.625 0.66666667 0.6 0.77777778 0.66666667 0.58333333 0.71428571 0.64285714 0.85714286 0.625 ] mean value: 0.6758730158730158 key: train_precision value: [0.81730769 0.78504673 0.8 0.85714286 0.86956522 0.81 0.86458333 0.84158416 0.82795699 0.80909091] mean value: 0.8282277885901212 key: test_recall value: [0.45454545 0.72727273 0.81818182 0.63636364 0.83333333 0.58333333 0.41666667 0.75 0.54545455 0.45454545] mean value: 0.621969696969697 key: train_recall value: [0.82524272 0.81553398 0.81553398 0.75728155 0.78431373 0.79411765 0.81372549 0.83333333 0.74757282 0.86407767] mean value: 0.8050732914525033 key: test_roc_auc value: [0.60227273 0.6969697 0.65909091 0.73484848 0.68939394 0.56439394 0.61742424 0.64772727 0.72727273 0.59090909] mean value: 0.6530303030303031 key: train_roc_auc value: [0.8194841 0.79502189 0.80482581 0.81491529 0.83390444 0.80482581 0.84375595 0.83899676 0.7961165 0.83009709] mean value: 0.8181943651246907 key: test_jcc value: [0.35714286 0.53333333 0.52941176 0.53846154 0.58823529 0.41176471 0.35714286 0.52941176 0.5 0.35714286] mean value: 0.47020469726352077 key: train_jcc value: [0.69672131 0.66666667 0.67741935 0.67241379 0.70175439 0.66942149 0.72173913 0.72033898 0.64705882 0.71774194] mean value: 0.6891275872151366 MCC on Blind test: 0.3 Accuracy on Blind test: 0.65 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.00949764 0.00926423 0.00878119 0.00982475 0.00883627 0.00911903 0.00922585 0.00873303 0.00967121 0.00942421] mean value: 0.00923774242401123 key: score_time value: [0.00828195 0.00830245 0.00834298 0.00822258 0.00852466 0.00888777 0.00881886 0.00821614 0.00887179 0.00847054] mean value: 0.008493971824645997 key: test_mcc value: [0.47727273 0.48856385 0.3030303 0.48075018 0.58002308 0.76764947 0.74047959 0.58002308 0.68313005 0.09090909] mean value: 0.5191831414788004 key: train_mcc value: [0.76709739 0.73662669 0.77590489 0.75693529 0.75611614 0.69845687 0.70790488 0.71798813 0.73789886 0.74813718] mean value: 0.7403066312362871 key: test_accuracy value: [0.73913043 0.73913043 0.65217391 0.73913043 0.7826087 0.86956522 0.86956522 0.7826087 0.81818182 0.54545455] mean value: 0.7537549407114624 key: train_accuracy value: [0.88292683 0.86829268 0.88780488 0.87804878 0.87804878 0.84878049 0.85365854 0.85853659 0.86893204 0.87378641] mean value: 0.8698816007577551 key: test_fscore value: [0.72727273 0.75 0.63636364 0.7 0.81481481 0.85714286 0.88 0.81481481 0.77777778 0.54545455] mean value: 0.7503641173641173 key: train_fscore value: [0.88679245 0.86829268 0.88995215 0.88151659 0.87684729 0.85167464 0.85576923 0.86124402 0.86829268 0.87619048] mean value: 0.8716572217358802 key: test_precision value: [0.72727273 0.69230769 0.63636364 0.77777778 0.73333333 1. 0.84615385 0.73333333 1. 0.54545455] mean value: 0.7691996891996892 key: train_precision value: [0.86238532 0.87254902 0.87735849 0.86111111 0.88118812 0.8317757 0.83962264 0.8411215 0.87254902 0.85981308] mean value: 0.8599474002688899 key: test_recall value: [0.72727273 0.81818182 0.63636364 0.63636364 0.91666667 0.75 0.91666667 0.91666667 0.63636364 0.54545455] mean value: 0.75 key: train_recall value: [0.91262136 0.86407767 0.90291262 0.90291262 0.87254902 0.87254902 0.87254902 0.88235294 0.86407767 0.89320388] mean value: 0.8839805825242718 key: test_roc_auc value: [0.73863636 0.74242424 0.65151515 0.73484848 0.77651515 0.875 0.86742424 0.77651515 0.81818182 0.54545455] mean value: 0.7526515151515151 key: train_roc_auc value: [0.88278127 0.86831334 0.88773082 0.8779269 0.87802208 0.84889587 0.85375024 0.8586522 0.86893204 0.87378641] mean value: 0.8698791166952218 key: test_jcc value: [0.57142857 0.6 0.46666667 0.53846154 0.6875 0.75 0.78571429 0.6875 0.63636364 0.375 ] mean value: 0.6098634698634698 key: train_jcc value: [0.79661017 0.76724138 0.80172414 0.78813559 0.78070175 0.74166667 0.74789916 0.75630252 0.76724138 0.77966102] mean value: 0.7727183777937642 MCC on Blind test: 0.45 Accuracy on Blind test: 0.72 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.60935068 0.83438659 0.7129097 0.18024588 0.55924106 0.85111785 0.75755429 0.51584601 0.85538912 0.78311491] mean value: 0.6659156084060669 key: score_time value: [0.01095915 0.01519394 0.01093698 0.0109117 0.01094437 0.01403475 0.02076578 0.01094198 0.01319242 0.01348639] mean value: 0.013136744499206543 key: test_mcc value: [0.62050523 0.58930667 0.21969697 0.69084928 0.65151515 0.82575758 0.83743579 0.50168817 0.91287093 0.48795004] mean value: 0.6337575793672167 key: train_mcc value: [0.88361919 0.86356283 0.91435567 0.5161037 0.84404459 0.8742382 0.88447331 0.78922439 0.88349515 0.86407767] mean value: 0.831719470643712 key: test_accuracy value: [0.7826087 0.7826087 0.60869565 0.82608696 0.82608696 0.91304348 0.91304348 0.73913043 0.95454545 0.72727273] mean value: 0.8073122529644269 key: train_accuracy value: [0.94146341 0.93170732 0.95609756 0.75121951 0.92195122 0.93658537 0.94146341 0.89268293 0.94174757 0.93203883] mean value: 0.9146957139474308 key: test_fscore value: [0.70588235 0.8 0.60869565 0.77777778 0.83333333 0.91666667 0.92307692 0.78571429 0.95238095 0.66666667] mean value: 0.7970194610731695 key: train_fscore value: [0.94059406 0.93269231 0.95477387 0.72131148 0.92079208 0.93779904 0.94285714 0.89719626 0.94174757 0.93203883] mean value: 0.9121802646431316 key: test_precision value: [1. 0.71428571 0.58333333 1. 0.83333333 0.91666667 0.85714286 0.6875 1. 0.85714286] mean value: 0.8449404761904762 key: train_precision value: [0.95959596 0.92380952 0.98958333 0.825 0.93 0.91588785 0.91666667 0.85714286 0.94174757 0.93203883] mean value: 0.9191472598782621 key: test_recall value: [0.54545455 0.90909091 0.63636364 0.63636364 0.83333333 0.91666667 1. 0.91666667 0.90909091 0.54545455] mean value: 0.7848484848484848 key: train_recall value: [0.9223301 0.94174757 0.9223301 0.6407767 0.91176471 0.96078431 0.97058824 0.94117647 0.94174757 0.93203883] mean value: 0.9085284599276604 key: test_roc_auc value: [0.77272727 0.78787879 0.60984848 0.81818182 0.82575758 0.91287879 0.90909091 0.73106061 0.95454545 0.72727273] mean value: 0.8049242424242424 key: train_roc_auc value: [0.94155721 0.9316581 0.95626309 0.7517609 0.92190177 0.93670284 0.9416048 0.89291833 0.94174757 0.93203883] mean value: 0.9148153436131735 key: test_jcc value: [0.54545455 0.66666667 0.4375 0.63636364 0.71428571 0.84615385 0.85714286 0.64705882 0.90909091 0.5 ] mean value: 0.6759716998687587 key: train_jcc value: [0.88785047 0.87387387 0.91346154 0.56410256 0.85321101 0.88288288 0.89189189 0.81355932 0.88990826 0.87272727] mean value: 0.8443469079318687 MCC on Blind test: 0.32 Accuracy on Blind test: 0.66 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01073289 0.01022744 0.00823569 0.00801015 0.00778437 0.00778627 0.0077095 0.00775599 0.00761223 0.00784159] mean value: 0.00836961269378662 key: score_time value: [0.01050305 0.00813007 0.00807691 0.00800776 0.00780082 0.00781107 0.00772762 0.00769615 0.00769758 0.00771093] mean value: 0.008116197586059571 key: test_mcc value: [0.91666667 0.58930667 0.76277007 0.83743579 0.82575758 0.83971912 1. 0.91666667 0.81818182 0.73029674] mean value: 0.8236801120713376 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95652174 0.7826087 0.86956522 0.91304348 0.91304348 0.91304348 1. 0.95652174 0.90909091 0.86363636] mean value: 0.9077075098814229 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95652174 0.8 0.84210526 0.9 0.91666667 0.90909091 1. 0.95652174 0.90909091 0.85714286] mean value: 0.9047140083410107 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.91666667 0.71428571 1. 1. 0.91666667 1. 1. 1. 0.90909091 0.9 ] mean value: 0.9356709956709957 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.90909091 0.72727273 0.81818182 0.91666667 0.83333333 1. 0.91666667 0.90909091 0.81818182] mean value: 0.8848484848484849 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95833333 0.78787879 0.86363636 0.90909091 0.91287879 0.91666667 1. 0.95833333 0.90909091 0.86363636] mean value: 0.9079545454545455 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.91666667 0.66666667 0.72727273 0.81818182 0.84615385 0.83333333 1. 0.91666667 0.83333333 0.75 ] mean value: 0.8308275058275059 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.04 Accuracy on Blind test: 0.51 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.0877099 0.08413386 0.08387136 0.08425379 0.08437729 0.0846684 0.08445048 0.08430481 0.08534908 0.08467293] mean value: 0.08477919101715088 key: score_time value: [0.01659155 0.01651287 0.01647377 0.01637912 0.01636243 0.01650667 0.01641607 0.01648188 0.01631093 0.01695395] mean value: 0.016498923301696777 key: test_mcc value: [0.48075018 0.76764947 0.66414149 0.91605722 0.74047959 1. 1. 1. 0.81818182 0.81818182] mean value: 0.8205441588214263 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73913043 0.86956522 0.82608696 0.95652174 0.86956522 1. 1. 1. 0.90909091 0.90909091] mean value: 0.9079051383399209 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.7 0.88 0.83333333 0.95238095 0.88 1. 1. 1. 0.90909091 0.90909091] mean value: 0.9063896103896104 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.77777778 0.78571429 0.76923077 1. 0.84615385 1. 1. 1. 0.90909091 0.90909091] mean value: 0.8997058497058497 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.63636364 1. 0.90909091 0.90909091 0.91666667 1. 1. 1. 0.90909091 0.90909091] mean value: 0.918939393939394 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73484848 0.875 0.82954545 0.95454545 0.86742424 1. 1. 1. 0.90909091 0.90909091] mean value: 0.9079545454545455 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.53846154 0.78571429 0.71428571 0.90909091 0.78571429 1. 1. 1. 0.83333333 0.83333333] mean value: 0.83999333999334 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.33 Accuracy on Blind test: 0.63 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00697589 0.00684834 0.00684047 0.00689197 0.00688601 0.00683427 0.00684476 0.00691366 0.00708032 0.00721502] mean value: 0.006933069229125977 key: score_time value: [0.00782681 0.00778484 0.00773478 0.00783086 0.0077734 0.00773144 0.00774217 0.00777936 0.00781941 0.00777721] mean value: 0.007780027389526367 key: test_mcc value: [0.39393939 0.66414149 0.03816905 0.50168817 0.47727273 0.44411739 0.83971912 0.31252706 0.68313005 0.36514837] mean value: 0.4719852822351723 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.69565217 0.82608696 0.52173913 0.73913043 0.73913043 0.69565217 0.91304348 0.65217391 0.81818182 0.68181818] mean value: 0.7282608695652174 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.69565217 0.83333333 0.47619048 0.66666667 0.75 0.63157895 0.90909091 0.71428571 0.77777778 0.69565217] mean value: 0.7150228172539385 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 0.76923077 0.5 0.85714286 0.75 0.85714286 1. 0.625 1. 0.66666667] mean value: 0.7691849816849816 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.90909091 0.45454545 0.54545455 0.75 0.5 0.83333333 0.83333333 0.63636364 0.72727273] mean value: 0.6916666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.6969697 0.82954545 0.51893939 0.73106061 0.73863636 0.70454545 0.91666667 0.64393939 0.81818182 0.68181818] mean value: 0.728030303030303 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.53333333 0.71428571 0.3125 0.5 0.6 0.46153846 0.83333333 0.55555556 0.63636364 0.53333333] mean value: 0.5680243367743367 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.24 Accuracy on Blind test: 0.62 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.09106922 1.09859252 1.08210254 1.08648729 1.08477712 1.10239196 1.09634829 1.08665371 1.15152287 1.16227579] mean value: 1.1042221307754516 key: score_time value: [0.09068799 0.14433622 0.09434557 0.09718037 0.09297609 0.09529161 0.09361553 0.08816409 0.09734464 0.0969758 ] mean value: 0.09909179210662841 key: test_mcc value: [0.74047959 0.6992059 0.66414149 1. 0.91605722 0.91666667 1. 1. 0.91287093 0.91287093] mean value: 0.8762292726696667 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.82608696 0.82608696 1. 0.95652174 0.95652174 1. 1. 0.95454545 0.95454545] mean value: 0.9343873517786562 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.84615385 0.83333333 1. 0.96 0.95652174 1. 1. 0.95652174 0.95652174] mean value: 0.9366195254021341 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.9 0.73333333 0.76923077 1. 0.92307692 1. 1. 1. 0.91666667 0.91666667] mean value: 0.9158974358974359 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 1. 0.90909091 1. 1. 0.91666667 1. 1. 1. 1. ] mean value: 0.9643939393939394 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86742424 0.83333333 0.82954545 1. 0.95454545 0.95833333 1. 1. 0.95454545 0.95454545] mean value: 0.9352272727272727 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.73333333 0.71428571 1. 0.92307692 0.91666667 1. 1. 0.91666667 0.91666667] mean value: 0.8870695970695971 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.14 Accuracy on Blind test: 0.55 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.84751892 0.83743048 0.99197769 0.88181758 0.90272188 0.897192 0.83777547 0.89692569 0.9466176 0.89896107] mean value: 0.893893837928772 key: score_time value: [0.18853641 0.20011663 0.1846509 0.19059825 0.2371254 0.21251607 0.20379901 0.20380235 0.17668462 0.19526935] mean value: 0.19930989742279054 key: test_mcc value: [0.65909298 0.6992059 0.58930667 0.76277007 0.83743579 0.83971912 0.91605722 0.91605722 0.91287093 0.91287093] mean value: 0.8045386839254043 key: train_mcc value: [0.90516294 0.94216887 0.93386476 0.91325992 0.92355447 0.90523324 0.90523324 0.88720829 0.92389898 0.91473626] mean value: 0.915432096195379 key: test_accuracy value: [0.82608696 0.82608696 0.7826087 0.86956522 0.91304348 0.91304348 0.95652174 0.95652174 0.95454545 0.95454545] mean value: 0.8952569169960475 key: train_accuracy value: [0.95121951 0.97073171 0.96585366 0.95609756 0.96097561 0.95121951 0.95121951 0.94146341 0.96116505 0.95631068] mean value: 0.9566256215960217 key: test_fscore value: [0.8 0.84615385 0.8 0.84210526 0.92307692 0.90909091 0.96 0.96 0.95652174 0.95652174] mean value: 0.8953470419740442 key: train_fscore value: [0.95327103 0.97142857 0.96713615 0.95734597 0.96190476 0.95283019 0.95283019 0.94392523 0.96226415 0.95774648] mean value: 0.9580682723989425 key: test_precision value: [0.88888889 0.73333333 0.71428571 1. 0.85714286 1. 0.92307692 0.92307692 0.91666667 0.91666667] mean value: 0.8873137973137973 key: train_precision value: [0.91891892 0.95327103 0.93636364 0.93518519 0.93518519 0.91818182 0.91818182 0.90178571 0.93577982 0.92727273] mean value: 0.9280125848126148 key: test_recall value: [0.72727273 1. 0.90909091 0.72727273 1. 0.83333333 1. 1. 1. 1. ] mean value: 0.9196969696969697 key: train_recall value: [0.99029126 0.99029126 1. 0.98058252 0.99019608 0.99019608 0.99019608 0.99019608 0.99029126 0.99029126] mean value: 0.9902531886541024 key: test_roc_auc value: [0.8219697 0.83333333 0.78787879 0.86363636 0.90909091 0.91666667 0.95454545 0.95454545 0.95454545 0.95454545] mean value: 0.8950757575757575 key: train_roc_auc value: [0.95102798 0.97063583 0.96568627 0.95597754 0.96111746 0.95140872 0.95140872 0.94169998 0.96116505 0.95631068] mean value: 0.9566438225775747 key: test_jcc value: [0.66666667 0.73333333 0.66666667 0.72727273 0.85714286 0.83333333 0.92307692 0.92307692 0.91666667 0.91666667] mean value: 0.8163902763902764 key: train_jcc value: [0.91071429 0.94444444 0.93636364 0.91818182 0.9266055 0.90990991 0.90990991 0.89380531 0.92727273 0.91891892] mean value: 0.919612646503732 MCC on Blind test: 0.34 Accuracy on Blind test: 0.64 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02035141 0.00765848 0.00785589 0.00786734 0.00790954 0.0078063 0.00794077 0.00780129 0.00784397 0.00801635] mean value: 0.00910513401031494 key: score_time value: [0.01214242 0.00866485 0.00880075 0.00867152 0.00868607 0.00868559 0.0086391 0.00873423 0.00867414 0.00874686] mean value: 0.009044551849365234 key: test_mcc value: [ 0.39393939 0.06579517 -0.03816905 0.38932432 0.33946383 0.56490196 0.33946383 0.21452908 0.54772256 0.36514837] mean value: 0.318211945518085 key: train_mcc value: [0.39749865 0.36390677 0.37171873 0.369368 0.40852696 0.36225341 0.37286188 0.38354703 0.38043802 0.34401398] mean value: 0.37541334377025193 key: test_accuracy value: [0.69565217 0.52173913 0.47826087 0.69565217 0.65217391 0.7826087 0.65217391 0.60869565 0.77272727 0.68181818] mean value: 0.6541501976284585 key: train_accuracy value: [0.69756098 0.67804878 0.68292683 0.68292683 0.70243902 0.67804878 0.68292683 0.68780488 0.68932039 0.66504854] mean value: 0.6847051858868103 key: test_fscore value: [0.69565217 0.59259259 0.5 0.66666667 0.73333333 0.8 0.73333333 0.66666667 0.7826087 0.66666667] mean value: 0.6837520128824477 key: train_fscore value: [0.71559633 0.71052632 0.71111111 0.70588235 0.71889401 0.7027027 0.70852018 0.71428571 0.7037037 0.70638298] mean value: 0.7097605398121303 key: test_precision value: [0.66666667 0.5 0.46153846 0.7 0.61111111 0.76923077 0.61111111 0.6 0.75 0.7 ] mean value: 0.6369658119658119 key: train_precision value: [0.67826087 0.648 0.6557377 0.66101695 0.67826087 0.65 0.65289256 0.6557377 0.67256637 0.62878788] mean value: 0.6581260910571809 key: test_recall value: [0.72727273 0.72727273 0.54545455 0.63636364 0.91666667 0.83333333 0.91666667 0.75 0.81818182 0.63636364] mean value: 0.7507575757575757 key: train_recall value: [0.75728155 0.78640777 0.77669903 0.75728155 0.76470588 0.76470588 0.7745098 0.78431373 0.73786408 0.80582524] mean value: 0.7709594517418618 key: test_roc_auc value: [0.6969697 0.53030303 0.48106061 0.69318182 0.64015152 0.78030303 0.64015152 0.60227273 0.77272727 0.68181818] mean value: 0.6518939393939394 key: train_roc_auc value: [0.69726823 0.67751761 0.68246716 0.68256235 0.70274129 0.67846945 0.68337141 0.68827337 0.68932039 0.66504854] mean value: 0.6847039786788501 key: test_jcc value: [0.53333333 0.42105263 0.33333333 0.5 0.57894737 0.66666667 0.57894737 0.5 0.64285714 0.5 ] mean value: 0.5255137844611529 key: train_jcc value: [0.55714286 0.55102041 0.55172414 0.54545455 0.56115108 0.54166667 0.54861111 0.55555556 0.54285714 0.54605263] mean value: 0.5501236135597817 MCC on Blind test: 0.47 Accuracy on Blind test: 0.73 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.09073043 0.0420742 0.04870391 0.04456186 0.04261518 0.04289222 0.04772949 0.04729652 0.04701734 0.04742146] mean value: 0.05010426044464111 key: score_time value: [0.00988626 0.01027012 0.01072621 0.00994897 0.00984526 0.0098393 0.01010942 0.01019359 0.01010847 0.01034665] mean value: 0.010127425193786621 key: test_mcc value: [0.82575758 0.6992059 0.74242424 0.91605722 0.74242424 0.91666667 0.91605722 1. 0.91287093 0.81818182] mean value: 0.8489645823067301 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91304348 0.82608696 0.86956522 0.95652174 0.86956522 0.95652174 0.95652174 1. 0.95454545 0.90909091] mean value: 0.9211462450592885 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.84615385 0.86956522 0.95238095 0.86956522 0.95652174 0.96 1. 0.95652174 0.90909091] mean value: 0.9228890529760094 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.90909091 0.73333333 0.83333333 1. 0.90909091 1. 0.92307692 1. 0.91666667 0.90909091] mean value: 0.9133682983682984 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 1. 0.90909091 0.90909091 0.83333333 0.91666667 1. 1. 1. 0.90909091] mean value: 0.9386363636363636 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91287879 0.83333333 0.87121212 0.95454545 0.87121212 0.95833333 0.95454545 1. 0.95454545 0.90909091] mean value: 0.921969696969697 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.73333333 0.76923077 0.90909091 0.76923077 0.91666667 0.92307692 1. 0.91666667 0.83333333] mean value: 0.8603962703962704 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.07 Accuracy on Blind test: 0.52 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01050711 0.02742529 0.02827191 0.03134918 0.03139067 0.03144336 0.03741717 0.03171349 0.02641296 0.03151488] mean value: 0.028744602203369142 key: score_time value: [0.01013732 0.0210259 0.01963973 0.01852012 0.0103941 0.02017403 0.01831055 0.01890373 0.02068543 0.01523781] mean value: 0.017302870750427246 key: test_mcc value: [0.69084928 0.65151515 0.39393939 0.91605722 0.65151515 0.91666667 0.74047959 0.91605722 0.75592895 0.91287093] mean value: 0.7545879558265087 key: train_mcc value: [0.86404384 0.86356283 0.9024367 0.83417421 0.87321531 0.83418999 0.85370265 0.86358877 0.86407767 0.8544092 ] mean value: 0.8607401153973653 key: test_accuracy value: [0.82608696 0.82608696 0.69565217 0.95652174 0.82608696 0.95652174 0.86956522 0.95652174 0.86363636 0.95454545] mean value: 0.8731225296442688 key: train_accuracy value: [0.93170732 0.93170732 0.95121951 0.91707317 0.93658537 0.91707317 0.92682927 0.93170732 0.93203883 0.92718447] mean value: 0.9303125739995264 key: test_fscore value: [0.77777778 0.81818182 0.69565217 0.95238095 0.83333333 0.95652174 0.88 0.96 0.84210526 0.95238095] mean value: 0.8668334010256207 key: train_fscore value: [0.93333333 0.93269231 0.95145631 0.9178744 0.93658537 0.91707317 0.92682927 0.93203883 0.93203883 0.92682927] mean value: 0.9306751090914163 key: test_precision value: [1. 0.81818182 0.66666667 1. 0.83333333 1. 0.84615385 0.92307692 1. 1. ] mean value: 0.9087412587412588 key: train_precision value: [0.91588785 0.92380952 0.95145631 0.91346154 0.93203883 0.91262136 0.9223301 0.92307692 0.93203883 0.93137255] mean value: 0.9258093821728087 key: test_recall value: [0.63636364 0.81818182 0.72727273 0.90909091 0.83333333 0.91666667 0.91666667 1. 0.72727273 0.90909091] mean value: 0.8393939393939394 key: train_recall value: [0.95145631 0.94174757 0.95145631 0.9223301 0.94117647 0.92156863 0.93137255 0.94117647 0.93203883 0.9223301 ] mean value: 0.935665334094803 key: test_roc_auc value: [0.81818182 0.82575758 0.6969697 0.95454545 0.82575758 0.95833333 0.86742424 0.95454545 0.86363636 0.95454545] mean value: 0.871969696969697 key: train_roc_auc value: [0.93161051 0.9316581 0.95121835 0.9170474 0.93660765 0.91709499 0.92685132 0.93175328 0.93203883 0.92718447] mean value: 0.9303064915286503 key: test_jcc value: [0.63636364 0.69230769 0.53333333 0.90909091 0.71428571 0.91666667 0.78571429 0.92307692 0.72727273 0.90909091] mean value: 0.7747202797202797 key: train_jcc value: [0.875 0.87387387 0.90740741 0.84821429 0.88073394 0.84684685 0.86363636 0.87272727 0.87272727 0.86363636] mean value: 0.8704803631523815 MCC on Blind test: 0.1 Accuracy on Blind test: 0.55 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.00954151 0.00753498 0.0070641 0.00683141 0.00684738 0.00718474 0.00700259 0.00671387 0.00674605 0.00678587] mean value: 0.007225251197814942 key: score_time value: [0.00951552 0.00836635 0.0077734 0.00772834 0.00780702 0.00820851 0.00769925 0.00763559 0.00767422 0.00770903] mean value: 0.008011722564697265 key: test_mcc value: [ 0.38932432 0.58930667 0.23262105 0.38932432 0.38932432 0.66414149 0.56490196 -0.06579517 0.29277002 0.36514837] mean value: 0.3811067348212412 key: train_mcc value: [0.48336719 0.44537263 0.49337247 0.42577585 0.49527272 0.41611143 0.48421652 0.47567594 0.44763689 0.43896694] mean value: 0.4605768597645512 key: test_accuracy value: [0.69565217 0.7826087 0.60869565 0.69565217 0.69565217 0.82608696 0.7826087 0.47826087 0.63636364 0.68181818] mean value: 0.6883399209486166 key: train_accuracy value: [0.74146341 0.72195122 0.74634146 0.71219512 0.74634146 0.70731707 0.74146341 0.73658537 0.72330097 0.7184466 ] mean value: 0.7295406109400899 key: test_fscore value: [0.66666667 0.8 0.64 0.66666667 0.72 0.81818182 0.8 0.57142857 0.69230769 0.66666667] mean value: 0.7041918081918082 key: train_fscore value: [0.74881517 0.73488372 0.75471698 0.7255814 0.75700935 0.71698113 0.74881517 0.74766355 0.73239437 0.73148148] mean value: 0.7398342306115098 key: test_precision value: [0.7 0.71428571 0.57142857 0.7 0.69230769 0.9 0.76923077 0.5 0.6 0.7 ] mean value: 0.6847252747252747 key: train_precision value: [0.73148148 0.70535714 0.73394495 0.69642857 0.72321429 0.69090909 0.72477064 0.71428571 0.70909091 0.69911504] mean value: 0.7128597836345258 key: test_recall value: [0.63636364 0.90909091 0.72727273 0.63636364 0.75 0.75 0.83333333 0.66666667 0.81818182 0.63636364] mean value: 0.7363636363636363 key: train_recall value: [0.76699029 0.76699029 0.77669903 0.75728155 0.79411765 0.74509804 0.7745098 0.78431373 0.75728155 0.76699029] mean value: 0.7690272225395012 key: test_roc_auc value: [0.69318182 0.78787879 0.61363636 0.69318182 0.69318182 0.82954545 0.78030303 0.46969697 0.63636364 0.68181818] mean value: 0.6878787878787879 key: train_roc_auc value: [0.74133828 0.72173044 0.74619265 0.71197411 0.74657339 0.70750048 0.74162383 0.73681706 0.72330097 0.7184466 ] mean value: 0.7295497810774796 key: test_jcc value: [0.5 0.66666667 0.47058824 0.5 0.5625 0.69230769 0.66666667 0.4 0.52941176 0.5 ] mean value: 0.5488141025641026 key: train_jcc value: [0.59848485 0.58088235 0.60606061 0.56934307 0.60902256 0.55882353 0.59848485 0.59701493 0.57777778 0.57664234] mean value: 0.5872536846384988 MCC on Blind test: 0.43 Accuracy on Blind test: 0.71 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00825429 0.01047587 0.01012659 0.01016808 0.01010203 0.00998354 0.01053905 0.01024556 0.01085734 0.01041651] mean value: 0.010116887092590333 key: score_time value: [0.00801587 0.01022434 0.01021957 0.01031947 0.01033974 0.01022935 0.01024055 0.01027536 0.01019835 0.01020384] mean value: 0.010026645660400391 key: test_mcc value: [0.65909298 0.42228828 0.37057951 0.76764947 0.76277007 0.74242424 0.55048188 0.40451992 0.91287093 0.75592895] mean value: 0.6348606236325591 key: train_mcc value: [0.85690497 0.73153872 0.87320324 0.70302948 0.80930285 0.8300002 0.72436632 0.58762141 0.80469539 0.81866523] mean value: 0.7739327829063736 key: test_accuracy value: [0.82608696 0.69565217 0.65217391 0.86956522 0.86956522 0.86956522 0.73913043 0.65217391 0.95454545 0.86363636] mean value: 0.799209486166008 key: train_accuracy value: [0.92682927 0.85365854 0.93658537 0.83414634 0.89756098 0.91219512 0.84390244 0.75609756 0.89805825 0.90776699] mean value: 0.8766800852474544 key: test_fscore value: [0.8 0.58823529 0.71428571 0.88 0.88888889 0.86956522 0.8 0.75 0.95238095 0.88 ] mean value: 0.8123356067064507 key: train_fscore value: [0.93023256 0.83333333 0.93719807 0.85714286 0.9058296 0.90625 0.86440678 0.80314961 0.89005236 0.91162791] mean value: 0.8839223061619048 key: test_precision value: [0.88888889 0.83333333 0.58823529 0.78571429 0.8 0.90909091 0.66666667 0.6 1. 0.78571429] mean value: 0.7857643663526016 key: train_precision value: [0.89285714 0.97402597 0.93269231 0.75555556 0.83471074 0.96666667 0.76119403 0.67105263 0.96590909 0.875 ] mean value: 0.8629664142938084 key: test_recall value: [0.72727273 0.45454545 0.90909091 1. 1. 0.83333333 1. 1. 0.90909091 1. ] mean value: 0.8833333333333333 key: train_recall value: [0.97087379 0.72815534 0.94174757 0.99029126 0.99019608 0.85294118 1. 1. 0.82524272 0.95145631] mean value: 0.9250904245193223 key: test_roc_auc value: [0.8219697 0.68560606 0.66287879 0.875 0.86363636 0.87121212 0.72727273 0.63636364 0.95454545 0.86363636] mean value: 0.7962121212121211 key: train_roc_auc value: [0.92661336 0.85427375 0.93656006 0.83338093 0.89801066 0.91190748 0.84466019 0.75728155 0.89805825 0.90776699] mean value: 0.8768513230534933 key: test_jcc value: [0.66666667 0.41666667 0.55555556 0.78571429 0.8 0.76923077 0.66666667 0.6 0.90909091 0.78571429] mean value: 0.6955305805305805 key: train_jcc value: [0.86956522 0.71428571 0.88181818 0.75 0.82786885 0.82857143 0.76119403 0.67105263 0.80188679 0.83760684] mean value: 0.7943849686015007 MCC on Blind test: 0.24 Accuracy on Blind test: 0.62 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00993443 0.01050615 0.00986814 0.00974298 0.01031899 0.01094747 0.00993729 0.01018286 0.01007271 0.01059127] mean value: 0.010210227966308594 key: score_time value: [0.01083398 0.01023293 0.01024818 0.01021266 0.01024604 0.01035571 0.01024151 0.01023602 0.01023817 0.01028705] mean value: 0.01031322479248047 key: test_mcc value: [0.56490196 0.58930667 0.48075018 1. 0.47923384 0.82575758 0.76277007 0.58930667 0.83205029 0.63636364] mean value: 0.6760440880009018 key: train_mcc value: [0.78910244 0.834498 0.74442173 0.81555702 0.74362503 0.82697375 0.85470694 0.87321531 0.82432211 0.83815726] mean value: 0.814457959189338 key: test_accuracy value: [0.7826087 0.7826087 0.73913043 1. 0.69565217 0.91304348 0.86956522 0.7826087 0.90909091 0.81818182] mean value: 0.8292490118577075 key: train_accuracy value: [0.88780488 0.91219512 0.86341463 0.90731707 0.85853659 0.91219512 0.92682927 0.93658537 0.90776699 0.91747573] mean value: 0.903012076722709 key: test_fscore value: [0.76190476 0.8 0.7 1. 0.77419355 0.91666667 0.88888889 0.76190476 0.91666667 0.81818182] mean value: 0.8338407112600661 key: train_fscore value: [0.89777778 0.91891892 0.84782609 0.90995261 0.87445887 0.91509434 0.92822967 0.93658537 0.91402715 0.91370558] mean value: 0.9056576368372846 key: test_precision value: [0.8 0.71428571 0.77777778 1. 0.63157895 0.91666667 0.8 0.88888889 0.84615385 0.81818182] mean value: 0.8193533659323133 key: train_precision value: [0.82786885 0.85714286 0.96296296 0.88888889 0.78294574 0.88181818 0.90654206 0.93203883 0.8559322 0.95744681] mean value: 0.8853587382632707 key: test_recall value: [0.72727273 0.90909091 0.63636364 1. 1. 0.91666667 1. 0.66666667 1. 0.81818182] mean value: 0.8674242424242424 key: train_recall value: [0.98058252 0.99029126 0.75728155 0.93203883 0.99019608 0.95098039 0.95098039 0.94117647 0.98058252 0.87378641] mean value: 0.934789644012945 key: test_roc_auc value: [0.78030303 0.78787879 0.73484848 1. 0.68181818 0.91287879 0.86363636 0.78787879 0.90909091 0.81818182] mean value: 0.8276515151515151 key: train_roc_auc value: [0.88735009 0.9118123 0.86393489 0.90719589 0.85917571 0.9123834 0.92694651 0.93660765 0.90776699 0.91747573] mean value: 0.903064915286503 key: test_jcc value: [0.61538462 0.66666667 0.53846154 1. 0.63157895 0.84615385 0.8 0.61538462 0.84615385 0.69230769] mean value: 0.7252091767881241 key: train_jcc value: [0.81451613 0.85 0.73584906 0.83478261 0.77692308 0.84347826 0.86607143 0.88073394 0.84166667 0.8411215 ] mean value: 0.8285142667643652 MCC on Blind test: 0.15 Accuracy on Blind test: 0.57 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.09048128 0.06841111 0.06838012 0.06843686 0.06842685 0.06856799 0.06842709 0.06839943 0.06843877 0.06855869] mean value: 0.07065281867980958 key: score_time value: [0.01429415 0.01409531 0.0138483 0.01383138 0.01385355 0.01397872 0.01384473 0.01389337 0.01391649 0.01385307] mean value: 0.013940906524658203 key: test_mcc value: [0.82575758 0.58930667 0.74242424 0.83743579 0.74242424 0.91666667 0.91605722 0.91666667 0.73029674 0.81818182] mean value: 0.8035217636234444 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91304348 0.7826087 0.86956522 0.91304348 0.86956522 0.95652174 0.95652174 0.95652174 0.86363636 0.90909091] mean value: 0.8990118577075099 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.8 0.86956522 0.9 0.86956522 0.95652174 0.96 0.95652174 0.85714286 0.90909091] mean value: 0.8987498588368154 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.90909091 0.71428571 0.83333333 1. 0.90909091 1. 0.92307692 1. 0.9 0.90909091] mean value: 0.9097968697968698 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.90909091 0.90909091 0.81818182 0.83333333 0.91666667 1. 0.91666667 0.81818182 0.90909091] mean value: 0.8939393939393939 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91287879 0.78787879 0.87121212 0.90909091 0.87121212 0.95833333 0.95454545 0.95833333 0.86363636 0.90909091] mean value: 0.8996212121212122 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.66666667 0.76923077 0.81818182 0.76923077 0.91666667 0.92307692 0.91666667 0.75 0.83333333] mean value: 0.8196386946386947 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.04 Accuracy on Blind test: 0.51 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.03047633 0.02374792 0.02200127 0.03589177 0.03106284 0.02499914 0.04321909 0.03124619 0.04236913 0.03466034] mean value: 0.0319674015045166 key: score_time value: [0.02318501 0.01713753 0.02661037 0.02022576 0.02133465 0.01644993 0.08056641 0.02024651 0.02388358 0.01804686] mean value: 0.02676866054534912 key: test_mcc value: [0.91605722 0.58930667 0.65151515 1. 0.74242424 0.91666667 0.91605722 0.91666667 0.91287093 0.91287093] mean value: 0.8474435701866356 key: train_mcc value: [0.99029126 0.98048734 1. 1. 0.99029034 1. 0.99029034 0.99029034 0.9613463 0.98076744] mean value: 0.9883763362991548 key: test_accuracy value: [0.95652174 0.7826087 0.82608696 1. 0.86956522 0.95652174 0.95652174 0.95652174 0.95454545 0.95454545] mean value: 0.9213438735177866 key: train_accuracy value: [0.99512195 0.9902439 1. 1. 0.99512195 1. 0.99512195 0.99512195 0.98058252 0.99029126] mean value: 0.994160549372484 key: test_fscore value: [0.95238095 0.8 0.81818182 1. 0.86956522 0.95652174 0.96 0.95652174 0.95652174 0.95652174] mean value: 0.9226214944475815 key: train_fscore value: [0.99512195 0.99029126 1. 1. 0.99507389 1. 0.99507389 0.99507389 0.98039216 0.99019608] mean value: 0.9941223123526399 key: test_precision value: [1. 0.71428571 0.81818182 1. 0.90909091 1. 0.92307692 1. 0.91666667 0.91666667] mean value: 0.9197968697968698 key: train_precision value: [1. 0.99029126 1. 1. 1. 1. 1. 1. 0.99009901 1. ] mean value: 0.9980390272036912 key: test_recall value: [0.90909091 0.90909091 0.81818182 1. 0.83333333 0.91666667 1. 0.91666667 1. 1. ] mean value: 0.9303030303030303 key: train_recall value: [0.99029126 0.99029126 1. 1. 0.99019608 1. 0.99019608 0.99019608 0.97087379 0.98058252] mean value: 0.9902627070245574 key: test_roc_auc value: [0.95454545 0.78787879 0.82575758 1. 0.87121212 0.95833333 0.95454545 0.95833333 0.95454545 0.95454545] mean value: 0.921969696969697 key: train_roc_auc value: [0.99514563 0.99024367 1. 1. 0.99509804 1. 0.99509804 0.99509804 0.98058252 0.99029126] mean value: 0.9941557205406435 key: test_jcc value: [0.90909091 0.66666667 0.69230769 1. 0.76923077 0.91666667 0.92307692 0.91666667 0.91666667 0.91666667] mean value: 0.8627039627039627 key: train_jcc value: [0.99029126 0.98076923 1. 1. 0.99019608 1. 0.99019608 0.99019608 0.96153846 0.98058252] mean value: 0.9883769714009577 MCC on Blind test: 0.14 Accuracy on Blind test: 0.55 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.0428648 0.05020165 0.05070519 0.05125546 0.05086875 0.05098724 0.05125928 0.05102658 0.0502708 0.05003095] mean value: 0.049947071075439456 key: score_time value: [0.02246642 0.02030492 0.02142429 0.02147484 0.02218461 0.02291584 0.02219915 0.02224135 0.01964617 0.02045941] mean value: 0.02153170108795166 key: test_mcc value: [0.48075018 0.48856385 0.39393939 0.65909298 0.56490196 0.6992059 0.91666667 0.38932432 0.68313005 0.18257419] mean value: 0.5458149483128502 key: train_mcc value: [0.94306341 0.92211753 0.9024367 0.90261781 0.92211753 0.9024367 0.93175328 0.91224062 0.90291262 0.88366175] mean value: 0.9125357952633087 key: test_accuracy value: [0.73913043 0.73913043 0.69565217 0.82608696 0.7826087 0.82608696 0.95652174 0.69565217 0.81818182 0.59090909] mean value: 0.76699604743083 key: train_accuracy value: [0.97073171 0.96097561 0.95121951 0.95121951 0.96097561 0.95121951 0.96585366 0.95609756 0.95145631 0.94174757] mean value: 0.9561496566421974 key: test_fscore value: [0.7 0.75 0.69565217 0.8 0.8 0.8 0.95652174 0.72 0.77777778 0.57142857] mean value: 0.7571380262249827 key: train_fscore value: [0.97169811 0.96153846 0.95145631 0.95098039 0.96039604 0.95098039 0.96585366 0.95609756 0.95145631 0.94230769] mean value: 0.9562764931842805 key: test_precision value: [0.77777778 0.69230769 0.66666667 0.88888889 0.76923077 1. 1. 0.69230769 1. 0.6 ] mean value: 0.8087179487179487 key: train_precision value: [0.94495413 0.95238095 0.95145631 0.96039604 0.97 0.95098039 0.96116505 0.95145631 0.95145631 0.93333333] mean value: 0.9527578826498 key: test_recall value: [0.63636364 0.81818182 0.72727273 0.72727273 0.83333333 0.66666667 0.91666667 0.75 0.63636364 0.54545455] mean value: 0.7257575757575757 key: train_recall value: [1. 0.97087379 0.95145631 0.94174757 0.95098039 0.95098039 0.97058824 0.96078431 0.95145631 0.95145631] mean value: 0.9600323624595469 key: test_roc_auc value: [0.73484848 0.74242424 0.6969697 0.8219697 0.78030303 0.83333333 0.95833333 0.69318182 0.81818182 0.59090909] mean value: 0.7670454545454545 key: train_roc_auc value: [0.97058824 0.96092709 0.95121835 0.95126594 0.96092709 0.95121835 0.96587664 0.95612031 0.95145631 0.94174757] mean value: 0.9561345897582334 key: test_jcc value: [0.53846154 0.6 0.53333333 0.66666667 0.66666667 0.66666667 0.91666667 0.5625 0.63636364 0.4 ] mean value: 0.6187325174825175 key: train_jcc value: [0.94495413 0.92592593 0.90740741 0.90654206 0.92380952 0.90654206 0.93396226 0.91588785 0.90740741 0.89090909] mean value: 0.9163347710667489 MCC on Blind test: 0.34 Accuracy on Blind test: 0.67 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.14630938 0.13025904 0.12856793 0.12697887 0.1226666 0.12366438 0.12613249 0.12406492 0.1238749 0.1242547 ] mean value: 0.127677321434021 key: score_time value: [0.0091486 0.00918722 0.00929856 0.0083158 0.00833082 0.00817156 0.00841713 0.00818658 0.00882602 0.0082829 ] mean value: 0.0086165189743042 key: test_mcc value: [0.82575758 0.58930667 0.74242424 1. 0.74242424 1. 1. 1. 0.91287093 0.91287093] mean value: 0.8725654585542313 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91304348 0.7826087 0.86956522 1. 0.86956522 1. 1. 1. 0.95454545 0.95454545] mean value: 0.9343873517786562 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.8 0.86956522 1. 0.86956522 1. 1. 1. 0.95652174 0.95652174] mean value: 0.9361264822134387 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.90909091 0.71428571 0.83333333 1. 0.90909091 1. 1. 1. 0.91666667 0.91666667] mean value: 0.9199134199134199 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.90909091 0.90909091 1. 0.83333333 1. 1. 1. 1. 1. ] mean value: 0.956060606060606 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91287879 0.78787879 0.87121212 1. 0.87121212 1. 1. 1. 0.95454545 0.95454545] mean value: 0.9352272727272727 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.66666667 0.76923077 1. 0.76923077 1. 1. 1. 0.91666667 0.91666667] mean value: 0.8871794871794871 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.14 Accuracy on Blind test: 0.55 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.00901484 0.01138854 0.01411247 0.01156902 0.01335311 0.01166534 0.01175451 0.01167417 0.01154208 0.01187825] mean value: 0.011795234680175782 key: score_time value: [0.01050758 0.01051354 0.0105617 0.01048422 0.0105226 0.01054072 0.01047564 0.01055694 0.01061535 0.01278687] mean value: 0.010756516456604004 key: test_mcc value: [0.17236256 0.6992059 0.29359034 0.76764947 0.22268089 0.74242424 0. 0.22268089 0.56694671 0.61237244] mean value: 0.4299913432106543 key: train_mcc value: [0.56341118 0.60589978 0.60122852 0.61135735 0.48234717 0.56859428 0.4515346 0.56519801 0.56644742 0.60352167] mean value: 0.5619539992238989 key: test_accuracy value: [0.56521739 0.82608696 0.60869565 0.86956522 0.56521739 0.86956522 0.52173913 0.56521739 0.77272727 0.77272727] mean value: 0.6936758893280632 key: train_accuracy value: [0.74634146 0.7804878 0.7902439 0.7804878 0.68780488 0.76585366 0.66829268 0.74146341 0.76213592 0.76699029] mean value: 0.7490101823348331 key: test_fscore value: [0.64285714 0.84615385 0.68965517 0.88 0.70588235 0.86956522 0.68571429 0.70588235 0.8 0.81481481] mean value: 0.7640525185227539 key: train_fscore value: [0.796875 0.81632653 0.81545064 0.81781377 0.76119403 0.8 0.75 0.79377432 0.8 0.81102362] mean value: 0.7962457910535393 key: test_precision value: [0.52941176 0.73333333 0.55555556 0.78571429 0.54545455 0.90909091 0.52173913 0.54545455 0.71428571 0.6875 ] mean value: 0.6527539784029553 key: train_precision value: [0.66666667 0.70422535 0.73076923 0.70138889 0.61445783 0.69565217 0.6 0.65806452 0.69014085 0.68211921] mean value: 0.6743484710173275 key: test_recall value: [0.81818182 1. 0.90909091 1. 1. 0.83333333 1. 1. 0.90909091 1. ] mean value: 0.946969696969697 key: train_recall value: [0.99029126 0.97087379 0.9223301 0.98058252 1. 0.94117647 1. 1. 0.95145631 1. ] mean value: 0.9756710451170759 key: test_roc_auc value: [0.57575758 0.83333333 0.62121212 0.875 0.54545455 0.87121212 0.5 0.54545455 0.77272727 0.77272727] mean value: 0.6912878787878788 key: train_roc_auc value: [0.74514563 0.77955454 0.78959642 0.77950695 0.68932039 0.76670474 0.66990291 0.74271845 0.76213592 0.76699029] mean value: 0.7491576242147344 key: test_jcc value: [0.47368421 0.73333333 0.52631579 0.78571429 0.54545455 0.76923077 0.52173913 0.54545455 0.66666667 0.6875 ] mean value: 0.6255093276288928 key: train_jcc value: [0.66233766 0.68965517 0.6884058 0.69178082 0.61445783 0.66666667 0.6 0.65806452 0.66666667 0.68211921] mean value: 0.6620154339856393 MCC on Blind test: 0.32 Accuracy on Blind test: 0.6 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.01143074 0.0101707 0.01010108 0.0101788 0.01018381 0.01013422 0.01016164 0.01012945 0.01014853 0.010252 ] mean value: 0.01028909683227539 key: score_time value: [0.01026726 0.0102675 0.01020479 0.01026201 0.01026845 0.01027465 0.01028585 0.01047206 0.01028395 0.0104382 ] mean value: 0.010302472114562988 key: test_mcc value: [0.62050523 0.74242424 0.39393939 0.83743579 0.74047959 0.91666667 0.91605722 0.82575758 1. 0.83205029] mean value: 0.7825316005376914 key: train_mcc value: [0.84404459 0.82455974 0.86409538 0.82438607 0.86356283 0.81495251 0.83417421 0.84389872 0.83499081 0.83499081] mean value: 0.8383655666865866 key: test_accuracy value: [0.7826087 0.86956522 0.69565217 0.91304348 0.86956522 0.95652174 0.95652174 0.91304348 1. 0.90909091] mean value: 0.8865612648221344 key: train_accuracy value: [0.92195122 0.91219512 0.93170732 0.91219512 0.93170732 0.90731707 0.91707317 0.92195122 0.91747573 0.91747573] mean value: 0.919104901728629 key: test_fscore value: [0.70588235 0.86956522 0.69565217 0.9 0.88 0.95652174 0.96 0.91666667 1. 0.9 ] mean value: 0.8784288150042625 key: train_fscore value: [0.92307692 0.91176471 0.93069307 0.91262136 0.93069307 0.90547264 0.91625616 0.92156863 0.9178744 0.9178744 ] mean value: 0.9187895340969339 key: test_precision value: [1. 0.83333333 0.66666667 1. 0.84615385 1. 0.92307692 0.91666667 1. 1. ] mean value: 0.9185897435897435 key: train_precision value: [0.91428571 0.92079208 0.94949495 0.91262136 0.94 0.91919192 0.92079208 0.92156863 0.91346154 0.91346154] mean value: 0.9225669804985783 key: test_recall value: [0.54545455 0.90909091 0.72727273 0.81818182 0.91666667 0.91666667 1. 0.91666667 1. 0.81818182] mean value: 0.8568181818181818 key: train_recall value: [0.93203883 0.90291262 0.91262136 0.91262136 0.92156863 0.89215686 0.91176471 0.92156863 0.9223301 0.9223301 ] mean value: 0.915191319246145 key: test_roc_auc value: [0.77272727 0.87121212 0.6969697 0.90909091 0.86742424 0.95833333 0.95454545 0.91287879 1. 0.90909091] mean value: 0.8852272727272728 key: train_roc_auc value: [0.92190177 0.91224062 0.93180088 0.91219303 0.9316581 0.90724348 0.9170474 0.92194936 0.91747573 0.91747573] mean value: 0.9190986103179135 key: test_jcc value: [0.54545455 0.76923077 0.53333333 0.81818182 0.78571429 0.91666667 0.92307692 0.84615385 1. 0.81818182] mean value: 0.7955994005994006 key: train_jcc value: [0.85714286 0.83783784 0.87037037 0.83928571 0.87037037 0.82727273 0.84545455 0.85454545 0.84821429 0.84821429] mean value: 0.8498708448708449 MCC on Blind test: 0.19 Accuracy on Blind test: 0.59 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:163: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:166: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.1247797 0.08746672 0.08165932 0.08129358 0.08163166 0.08223915 0.0814786 0.08128548 0.08197618 0.09512329] mean value: 0.08789336681365967 key: score_time value: [0.01067257 0.01061177 0.01049924 0.01049209 0.01054454 0.01051307 0.01054621 0.0105319 0.01048279 0.01061583] mean value: 0.010550999641418457 key: test_mcc value: [0.69084928 0.65151515 0.39393939 0.91605722 0.74047959 0.91666667 0.74047959 0.91605722 0.83205029 0.83205029] mean value: 0.7630144710301748 key: train_mcc value: [0.85400014 0.87352395 0.89272796 0.81467733 0.86356283 0.84389872 0.84389872 0.86358877 0.83499081 0.83499081] mean value: 0.8519860051204752 key: test_accuracy value: [0.82608696 0.82608696 0.69565217 0.95652174 0.86956522 0.95652174 0.86956522 0.95652174 0.90909091 0.90909091] mean value: 0.8774703557312253 key: train_accuracy value: [0.92682927 0.93658537 0.94634146 0.90731707 0.93170732 0.92195122 0.92195122 0.93170732 0.91747573 0.91747573] mean value: 0.925934170021312 key: test_fscore value: [0.77777778 0.81818182 0.69565217 0.95238095 0.88 0.95652174 0.88 0.96 0.9 0.9 ] mean value: 0.8720514461384027 key: train_fscore value: [0.92822967 0.93779904 0.94634146 0.90731707 0.93069307 0.92156863 0.92156863 0.93203883 0.9178744 0.9178744 ] mean value: 0.9261305196150217 key: test_precision value: [1. 0.81818182 0.66666667 1. 0.84615385 1. 0.84615385 0.92307692 1. 1. ] mean value: 0.91002331002331 key: train_precision value: [0.91509434 0.9245283 0.95098039 0.91176471 0.94 0.92156863 0.92156863 0.92307692 0.91346154 0.91346154] mean value: 0.9235504994450611 key: test_recall value: [0.63636364 0.81818182 0.72727273 0.90909091 0.91666667 0.91666667 0.91666667 1. 0.81818182 0.81818182] mean value: 0.8477272727272728 key: train_recall value: [0.94174757 0.95145631 0.94174757 0.90291262 0.92156863 0.92156863 0.92156863 0.94117647 0.9223301 0.9223301 ] mean value: 0.9288406624785837 key: test_roc_auc value: [0.81818182 0.82575758 0.6969697 0.95454545 0.86742424 0.95833333 0.86742424 0.95454545 0.90909091 0.90909091] mean value: 0.8761363636363636 key: train_roc_auc value: [0.92675614 0.93651247 0.94636398 0.90733866 0.9316581 0.92194936 0.92194936 0.93175328 0.91747573 0.91747573] mean value: 0.9259232819341329 key: test_jcc value: [0.63636364 0.69230769 0.53333333 0.90909091 0.78571429 0.91666667 0.78571429 0.92307692 0.81818182 0.81818182] mean value: 0.7818631368631369 key: train_jcc value: [0.86607143 0.88288288 0.89814815 0.83035714 0.87037037 0.85454545 0.85454545 0.87272727 0.84821429 0.84821429] mean value: 0.8626076726076726 MCC on Blind test: 0.12 Accuracy on Blind test: 0.56 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.01771116 0.01344275 0.0129807 0.01292109 0.01246119 0.01508546 0.01303816 0.0129621 0.01367259 0.01364589] mean value: 0.013792109489440919 key: score_time value: [0.0105226 0.00815105 0.00787663 0.00783443 0.00784397 0.00815797 0.00775981 0.00775981 0.0080657 0.0077579 ] mean value: 0.008172988891601562 key: test_mcc value: [ 0.56407607 0.875 0.63245553 0.57735027 0.57735027 0.57735027 1. -0.14285714 0.31622777 0.28867513] mean value: 0.5265628172174828 key: train_mcc value: [0.8114612 0.76470609 0.70321085 0.75 0.73446466 0.78278036 0.71910121 0.75146915 0.73446466 0.78163175] mean value: 0.7533289937125471 key: test_accuracy value: [0.73333333 0.93333333 0.78571429 0.78571429 0.78571429 0.78571429 1. 0.42857143 0.64285714 0.64285714] mean value: 0.7523809523809524 key: train_accuracy value: [0.90551181 0.88188976 0.8515625 0.875 0.8671875 0.890625 0.859375 0.875 0.8671875 0.890625 ] mean value: 0.876396407480315 key: test_fscore value: [0.77777778 0.93333333 0.82352941 0.76923077 0.76923077 0.76923077 1. 0.42857143 0.70588235 0.66666667] mean value: 0.7643453278747396 key: train_fscore value: [0.9047619 0.88372093 0.85271318 0.875 0.86821705 0.89393939 0.86153846 0.87878788 0.86821705 0.89230769] mean value: 0.8779203548389595 key: test_precision value: [0.63636364 1. 0.7 0.83333333 0.83333333 0.83333333 1. 0.42857143 0.6 0.625 ] mean value: 0.7489935064935065 key: train_precision value: [0.91935484 0.86363636 0.84615385 0.875 0.86153846 0.86764706 0.84848485 0.85294118 0.86153846 0.87878788] mean value: 0.8675082934143655 key: test_recall value: [1. 0.875 1. 0.71428571 0.71428571 0.71428571 1. 0.42857143 0.85714286 0.71428571] mean value: 0.8017857142857143 key: train_recall value: [0.890625 0.9047619 0.859375 0.875 0.875 0.921875 0.875 0.90625 0.875 0.90625 ] mean value: 0.8889136904761905 key: test_roc_auc value: [0.75 0.9375 0.78571429 0.78571429 0.78571429 0.78571429 1. 0.42857143 0.64285714 0.64285714] mean value: 0.7544642857142857 key: train_roc_auc value: [0.90562996 0.88206845 0.8515625 0.875 0.8671875 0.890625 0.859375 0.875 0.8671875 0.890625 ] mean value: 0.8764260912698413 key: test_jcc value: [0.63636364 0.875 0.7 0.625 0.625 0.625 1. 0.27272727 0.54545455 0.5 ] mean value: 0.6404545454545454 key: train_jcc value: [0.82608696 0.79166667 0.74324324 0.77777778 0.76712329 0.80821918 0.75675676 0.78378378 0.76712329 0.80555556] mean value: 0.782733649373018 MCC on Blind test: 0.38 Accuracy on Blind test: 0.69 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.38585448 0.3669436 0.36463881 0.37112308 0.37720537 0.37863564 0.39239931 0.38488078 0.38222742 0.37554836] mean value: 0.37794568538665774 key: score_time value: [0.00842977 0.00810385 0.00817847 0.00810599 0.00813341 0.00875115 0.00849652 0.00883865 0.00872874 0.0085032 ] mean value: 0.008426976203918458 key: test_mcc value: [0.56407607 0.46428571 0.71428571 0.4472136 0.8660254 0.57735027 1. 0.4472136 0.42857143 0.42857143] mean value: 0.5937593224506033 key: train_mcc value: [1. 0.95287698 1. 0.9379581 1. 0.95417386 0.98449518 0.96922337 1. 0.95324137] mean value: 0.9751968870278498 key: test_accuracy value: [0.73333333 0.73333333 0.85714286 0.71428571 0.92857143 0.78571429 1. 0.71428571 0.71428571 0.71428571] mean value: 0.7895238095238095 key: train_accuracy value: [1. 0.97637795 1. 0.96875 1. 0.9765625 0.9921875 0.984375 1. 0.9765625 ] mean value: 0.9874815452755905 key: test_fscore value: [0.77777778 0.75 0.85714286 0.66666667 0.92307692 0.76923077 1. 0.66666667 0.71428571 0.71428571] mean value: 0.7839133089133089 key: train_fscore value: [1. 0.97637795 1. 0.96923077 1. 0.97709924 0.99224806 0.98461538 1. 0.97674419] mean value: 0.9876315591305296 key: test_precision value: [0.63636364 0.75 0.85714286 0.8 1. 0.83333333 1. 0.8 0.71428571 0.71428571] mean value: 0.8105411255411256 key: train_precision value: [1. 0.96875 1. 0.95454545 1. 0.95522388 0.98461538 0.96969697 1. 0.96923077] mean value: 0.9802062458685593 key: test_recall value: [1. 0.75 0.85714286 0.57142857 0.85714286 0.71428571 1. 0.57142857 0.71428571 0.71428571] mean value: 0.775 key: train_recall value: [1. 0.98412698 1. 0.984375 1. 1. 1. 1. 1. 0.984375 ] mean value: 0.9952876984126984 key: test_roc_auc value: [0.75 0.73214286 0.85714286 0.71428571 0.92857143 0.78571429 1. 0.71428571 0.71428571 0.71428571] mean value: 0.7910714285714286 key: train_roc_auc value: [1. 0.97643849 1. 0.96875 1. 0.9765625 0.9921875 0.984375 1. 0.9765625 ] mean value: 0.9874875992063492 key: test_jcc value: [0.63636364 0.6 0.75 0.5 0.85714286 0.625 1. 0.5 0.55555556 0.55555556] mean value: 0.6579617604617605 key: train_jcc value: [1. 0.95384615 1. 0.94029851 1. 0.95522388 0.98461538 0.96969697 1. 0.95454545] mean value: 0.9758226350763665 MCC on Blind test: 0.09 Accuracy on Blind test: 0.54 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.00919986 0.0087204 0.0068109 0.00678849 0.00648975 0.00657201 0.00651431 0.00667238 0.00647902 0.00650811] mean value: 0.0070755243301391605 key: score_time value: [0.0102222 0.01021814 0.00809312 0.00810909 0.00769401 0.00771046 0.0077498 0.00773859 0.00767064 0.0077889 ] mean value: 0.008299493789672851 key: test_mcc value: [ 0.56407607 0.34247476 0.2773501 0.63245553 0.52223297 0. 0.52223297 -0.17407766 0.17407766 0.31622777] mean value: 0.3177050166425868 key: train_mcc value: [0.42609813 0.36309219 0.42452948 0.45355737 0.43819207 0.40213949 0.40574111 0.51298918 0.40574111 0.46530981] mean value: 0.4297389942041204 key: test_accuracy value: [0.73333333 0.66666667 0.57142857 0.78571429 0.71428571 0.5 0.71428571 0.42857143 0.57142857 0.64285714] mean value: 0.6328571428571429 key: train_accuracy value: [0.68503937 0.61417323 0.6875 0.6875 0.6953125 0.6640625 0.671875 0.734375 0.671875 0.7109375 ] mean value: 0.6822650098425197 key: test_fscore value: [0.77777778 0.73684211 0.7 0.82352941 0.77777778 0.58823529 0.77777778 0.55555556 0.66666667 0.70588235] mean value: 0.7110044719642242 key: train_fscore value: [0.75 0.72 0.74683544 0.75609756 0.75159236 0.73939394 0.74074074 0.77922078 0.74074074 0.76129032] mean value: 0.7485911883378328 key: test_precision value: [0.63636364 0.63636364 0.53846154 0.7 0.63636364 0.5 0.63636364 0.45454545 0.54545455 0.6 ] mean value: 0.5883916083916084 key: train_precision value: [0.625 0.5625 0.62765957 0.62 0.6344086 0.6039604 0.6122449 0.66666667 0.6122449 0.64835165] mean value: 0.6213036683594909 key: test_recall value: [1. 0.875 1. 1. 1. 0.71428571 1. 0.71428571 0.85714286 0.85714286] mean value: 0.9017857142857143 key: train_recall value: [0.9375 1. 0.921875 0.96875 0.921875 0.953125 0.9375 0.9375 0.9375 0.921875] mean value: 0.94375 key: test_roc_auc value: [0.75 0.65178571 0.57142857 0.78571429 0.71428571 0.5 0.71428571 0.42857143 0.57142857 0.64285714] mean value: 0.6330357142857143 key: train_roc_auc value: [0.68303571 0.6171875 0.6875 0.6875 0.6953125 0.6640625 0.671875 0.734375 0.671875 0.7109375 ] mean value: 0.6823660714285714 key: test_jcc value: [0.63636364 0.58333333 0.53846154 0.7 0.63636364 0.41666667 0.63636364 0.38461538 0.5 0.54545455] mean value: 0.5577622377622378 key: train_jcc value: [0.6 0.5625 0.5959596 0.60784314 0.60204082 0.58653846 0.58823529 0.63829787 0.58823529 0.61458333] mean value: 0.5984233804988544 MCC on Blind test: 0.42 Accuracy on Blind test: 0.69 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00683665 0.00669193 0.00670457 0.00669694 0.00665402 0.00670123 0.00669932 0.00667048 0.00668812 0.0066514 ] mean value: 0.006699466705322265 key: score_time value: [0.00776768 0.0077312 0.0078032 0.00768924 0.0077219 0.00774479 0.00771761 0.00768328 0.00771379 0.00770187] mean value: 0.0077274560928344725 key: test_mcc value: [-0.19642857 0.47245559 0.63245553 -0.14285714 0. 0.4472136 0.4472136 0. 0.28867513 0.1490712 ] mean value: 0.20977989331042105 key: train_mcc value: [0.35590281 0.40535457 0.36154406 0.34995662 0.43771378 0.36480373 0.40704579 0.34391797 0.37665889 0.375 ] mean value: 0.37778982176154485 key: test_accuracy value: [0.4 0.73333333 0.78571429 0.42857143 0.5 0.71428571 0.71428571 0.5 0.64285714 0.57142857] mean value: 0.599047619047619 key: train_accuracy value: [0.67716535 0.7007874 0.6796875 0.671875 0.71875 0.6796875 0.703125 0.671875 0.6875 0.6875 ] mean value: 0.6877952755905512 key: test_fscore value: [0.4 0.77777778 0.82352941 0.42857143 0.53333333 0.66666667 0.75 0.53333333 0.66666667 0.5 ] mean value: 0.6079878618113912 key: train_fscore value: [0.6962963 0.71641791 0.6962963 0.7 0.71428571 0.70503597 0.71212121 0.67692308 0.70149254 0.6875 ] mean value: 0.7006369014906811 key: test_precision value: [0.375 0.7 0.7 0.42857143 0.5 0.8 0.66666667 0.5 0.625 0.6 ] mean value: 0.5895238095238096 key: train_precision value: [0.66197183 0.67605634 0.66197183 0.64473684 0.72580645 0.65333333 0.69117647 0.66666667 0.67142857 0.6875 ] mean value: 0.6740648335734973 key: test_recall value: [0.42857143 0.875 1. 0.42857143 0.57142857 0.57142857 0.85714286 0.57142857 0.71428571 0.42857143] mean value: 0.6446428571428571 key: train_recall value: [0.734375 0.76190476 0.734375 0.765625 0.703125 0.765625 0.734375 0.6875 0.734375 0.6875 ] mean value: 0.7308779761904762 key: test_roc_auc value: [0.40178571 0.72321429 0.78571429 0.42857143 0.5 0.71428571 0.71428571 0.5 0.64285714 0.57142857] mean value: 0.5982142857142857 key: train_roc_auc value: [0.67671131 0.70126488 0.6796875 0.671875 0.71875 0.6796875 0.703125 0.671875 0.6875 0.6875 ] mean value: 0.687797619047619 key: test_jcc value: [0.25 0.63636364 0.7 0.27272727 0.36363636 0.5 0.6 0.36363636 0.5 0.33333333] mean value: 0.45196969696969697 key: train_jcc value: [0.53409091 0.55813953 0.53409091 0.53846154 0.55555556 0.54444444 0.55294118 0.51162791 0.54022989 0.52380952] mean value: 0.5393391383841405 MCC on Blind test: 0.39 Accuracy on Blind test: 0.69 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00661182 0.00731397 0.0074687 0.00730467 0.00664449 0.00739288 0.00724483 0.00748181 0.0071733 0.00716472] mean value: 0.007180118560791015 key: score_time value: [0.00903392 0.00946617 0.01499748 0.01391649 0.00954628 0.01419783 0.01409173 0.00956845 0.00938296 0.00936103] mean value: 0.011356234550476074 key: test_mcc value: [ 0.34247476 0.04029115 0.14285714 -0.1490712 -0.31622777 0.28867513 0.14285714 0.14285714 0.14285714 0.1490712 ] mean value: 0.0926641847918602 key: train_mcc value: [0.52955101 0.59052579 0.59491308 0.5172058 0.48729852 0.51568795 0.37518324 0.50221186 0.5787612 0.53229065] mean value: 0.5223629097954794 key: test_accuracy value: [0.66666667 0.53333333 0.57142857 0.42857143 0.35714286 0.64285714 0.57142857 0.57142857 0.57142857 0.57142857] mean value: 0.5485714285714286 key: train_accuracy value: [0.76377953 0.79527559 0.796875 0.7578125 0.7421875 0.7578125 0.6875 0.75 0.7890625 0.765625 ] mean value: 0.7605930118110236 key: test_fscore value: [0.54545455 0.63157895 0.57142857 0.5 0.18181818 0.66666667 0.57142857 0.57142857 0.57142857 0.5 ] mean value: 0.53112326270221 key: train_fscore value: [0.7761194 0.79365079 0.79032258 0.76691729 0.72727273 0.75590551 0.68253968 0.73770492 0.784 0.75806452] mean value: 0.7572497426299365 key: test_precision value: [0.75 0.54545455 0.57142857 0.44444444 0.25 0.625 0.57142857 0.57142857 0.57142857 0.6 ] mean value: 0.5500613275613275 key: train_precision value: [0.74285714 0.79365079 0.81666667 0.73913043 0.77192982 0.76190476 0.69354839 0.77586207 0.80327869 0.78333333] mean value: 0.7682162102343593 key: test_recall value: [0.42857143 0.75 0.57142857 0.57142857 0.14285714 0.71428571 0.57142857 0.57142857 0.57142857 0.42857143] mean value: 0.5321428571428571 key: train_recall value: [0.8125 0.79365079 0.765625 0.796875 0.6875 0.75 0.671875 0.703125 0.765625 0.734375 ] mean value: 0.7481150793650794 key: test_roc_auc value: [0.65178571 0.51785714 0.57142857 0.42857143 0.35714286 0.64285714 0.57142857 0.57142857 0.57142857 0.57142857] mean value: 0.5455357142857142 key: train_roc_auc value: [0.76339286 0.7952629 0.796875 0.7578125 0.7421875 0.7578125 0.6875 0.75 0.7890625 0.765625 ] mean value: 0.7605530753968254 key: test_jcc value: [0.375 0.46153846 0.4 0.33333333 0.1 0.5 0.4 0.4 0.4 0.33333333] mean value: 0.3703205128205128 key: train_jcc value: [0.63414634 0.65789474 0.65333333 0.62195122 0.57142857 0.60759494 0.51807229 0.58441558 0.64473684 0.61038961] mean value: 0.6103963465355565 MCC on Blind test: 0.2 Accuracy on Blind test: 0.6 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.00949454 0.00895095 0.0088582 0.00882721 0.00884652 0.00871706 0.00863242 0.00808167 0.00882864 0.00886631] mean value: 0.00881035327911377 key: score_time value: [0.00895262 0.00876713 0.00880933 0.00889349 0.00878167 0.00939393 0.00883389 0.00905347 0.00875926 0.00885344] mean value: 0.008909821510314941 key: test_mcc value: [ 0.49099025 0.6000992 0.31622777 0.42857143 0.28867513 0.57735027 0.57735027 -0.17407766 0. 0.1490712 ] mean value: 0.3254257861286581 key: train_mcc value: [0.76388889 0.75156113 0.70389875 0.68884672 0.67195703 0.73518314 0.62776482 0.67261436 0.67195703 0.76571848] mean value: 0.7053390350757779 key: test_accuracy value: [0.73333333 0.8 0.64285714 0.71428571 0.64285714 0.78571429 0.78571429 0.42857143 0.5 0.57142857] mean value: 0.6604761904761904 key: train_accuracy value: [0.88188976 0.87401575 0.8515625 0.84375 0.8359375 0.8671875 0.8125 0.8359375 0.8359375 0.8828125 ] mean value: 0.8521530511811024 key: test_fscore value: [0.75 0.82352941 0.70588235 0.71428571 0.61538462 0.76923077 0.76923077 0.55555556 0.53333333 0.5 ] mean value: 0.673643252172664 key: train_fscore value: [0.88188976 0.87878788 0.85496183 0.84848485 0.83464567 0.87022901 0.82089552 0.83969466 0.83464567 0.88372093] mean value: 0.8547955778438756 key: test_precision value: [0.66666667 0.77777778 0.6 0.71428571 0.66666667 0.83333333 0.83333333 0.45454545 0.5 0.6 ] mean value: 0.6646608946608946 key: train_precision value: [0.88888889 0.84057971 0.8358209 0.82352941 0.84126984 0.85074627 0.78571429 0.82089552 0.84126984 0.87692308] mean value: 0.8405637742542732 key: test_recall value: [0.85714286 0.875 0.85714286 0.71428571 0.57142857 0.71428571 0.71428571 0.71428571 0.57142857 0.42857143] mean value: 0.7017857142857142 key: train_recall value: [0.875 0.92063492 0.875 0.875 0.828125 0.890625 0.859375 0.859375 0.828125 0.890625 ] mean value: 0.870188492063492 key: test_roc_auc value: [0.74107143 0.79464286 0.64285714 0.71428571 0.64285714 0.78571429 0.78571429 0.42857143 0.5 0.57142857] mean value: 0.6607142857142857 key: train_roc_auc value: [0.88194444 0.87437996 0.8515625 0.84375 0.8359375 0.8671875 0.8125 0.8359375 0.8359375 0.8828125 ] mean value: 0.8521949404761905 key: test_jcc value: [0.6 0.7 0.54545455 0.55555556 0.44444444 0.625 0.625 0.38461538 0.36363636 0.33333333] mean value: 0.5177039627039627 key: train_jcc value: [0.78873239 0.78378378 0.74666667 0.73684211 0.71621622 0.77027027 0.69620253 0.72368421 0.71621622 0.79166667] mean value: 0.747028106162106 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.4545188 0.45730591 0.4523077 0.58290815 0.4624598 0.48649025 0.46417975 0.6089313 0.45516968 0.47067261] mean value: 0.4894943952560425 key: score_time value: [0.01082301 0.01304746 0.01290703 0.01316142 0.01308894 0.01309848 0.01328969 0.01329231 0.01081371 0.01333547] mean value: 0.012685751914978028 key: test_mcc value: [0.49099025 0.05455447 0.42857143 0.42857143 0.57735027 0.57735027 1. 0.14285714 0. 0.1490712 ] mean value: 0.38493164624692183 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73333333 0.53333333 0.71428571 0.71428571 0.78571429 0.78571429 1. 0.57142857 0.5 0.57142857] mean value: 0.690952380952381 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.75 0.58823529 0.71428571 0.71428571 0.76923077 0.76923077 1. 0.57142857 0.53333333 0.5 ] mean value: 0.6910030165912519 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 0.55555556 0.71428571 0.71428571 0.83333333 0.83333333 1. 0.57142857 0.5 0.6 ] mean value: 0.6988888888888889 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.85714286 0.625 0.71428571 0.71428571 0.71428571 0.71428571 1. 0.57142857 0.57142857 0.42857143] mean value: 0.6910714285714286 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.74107143 0.52678571 0.71428571 0.71428571 0.78571429 0.78571429 1. 0.57142857 0.5 0.57142857] mean value: 0.6910714285714286 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.6 0.41666667 0.55555556 0.55555556 0.625 0.625 1. 0.4 0.36363636 0.33333333] mean value: 0.5474747474747474 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.15 Accuracy on Blind test: 0.57 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01064634 0.00947118 0.00758362 0.00736046 0.00720119 0.00724173 0.00704122 0.00714111 0.00716519 0.00737357] mean value: 0.007822561264038085 key: score_time value: [0.01261735 0.00878739 0.00803566 0.0079627 0.00771618 0.00781584 0.00776696 0.00778174 0.00768185 0.00766587] mean value: 0.00838315486907959 key: test_mcc value: [0.66143783 0.875 1. 0.52223297 0.71428571 0.8660254 0.74535599 0.1490712 0.8660254 0.28867513] mean value: 0.6688109643082562 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8 0.93333333 1. 0.71428571 0.85714286 0.92857143 0.85714286 0.57142857 0.92857143 0.64285714] mean value: 0.8233333333333334 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.82352941 0.93333333 1. 0.6 0.85714286 0.93333333 0.83333333 0.625 0.93333333 0.61538462] mean value: 0.8154390217625511 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7 1. 1. 1. 0.85714286 0.875 1. 0.55555556 0.875 0.66666667] mean value: 0.8529365079365079 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.875 1. 0.42857143 0.85714286 1. 0.71428571 0.71428571 1. 0.57142857] mean value: 0.8160714285714286 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 0.9375 1. 0.71428571 0.85714286 0.92857143 0.85714286 0.57142857 0.92857143 0.64285714] mean value: 0.8250000000000001 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.7 0.875 1. 0.42857143 0.75 0.875 0.71428571 0.45454545 0.875 0.44444444] mean value: 0.7116847041847042 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.13 Accuracy on Blind test: 0.55 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.07988238 0.07995939 0.07999921 0.08072162 0.07959414 0.08084035 0.08072901 0.08017445 0.08056831 0.07979488] mean value: 0.08022637367248535 key: score_time value: [0.01623416 0.01625252 0.01616454 0.01613855 0.01608205 0.01615548 0.01745152 0.01737761 0.01639533 0.01741838] mean value: 0.016567015647888185 key: test_mcc value: [0.37796447 0.875 0.8660254 0.71428571 0.8660254 0.8660254 1. 0.14285714 0.57735027 0.28867513] mean value: 0.6574208945289839 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.66666667 0.93333333 0.92857143 0.85714286 0.92857143 0.92857143 1. 0.57142857 0.78571429 0.64285714] mean value: 0.8242857142857143 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.70588235 0.93333333 0.93333333 0.85714286 0.92307692 0.92307692 1. 0.57142857 0.8 0.61538462] mean value: 0.8262658909717733 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6 1. 0.875 0.85714286 1. 1. 1. 0.57142857 0.75 0.66666667] mean value: 0.8320238095238095 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.85714286 0.875 1. 0.85714286 0.85714286 0.85714286 1. 0.57142857 0.85714286 0.57142857] mean value: 0.8303571428571428 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.67857143 0.9375 0.92857143 0.85714286 0.92857143 0.92857143 1. 0.57142857 0.78571429 0.64285714] mean value: 0.8258928571428572 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.54545455 0.875 0.875 0.75 0.85714286 0.85714286 1. 0.4 0.66666667 0.44444444] mean value: 0.7270851370851371 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.28 Accuracy on Blind test: 0.64 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00677133 0.00666976 0.00663257 0.00673056 0.00663328 0.00662422 0.00719857 0.00701356 0.00682855 0.00671649] mean value: 0.006781888008117676 key: score_time value: [0.00769711 0.00769949 0.00777817 0.00801635 0.00804901 0.00826836 0.00769639 0.00777459 0.00769544 0.00789642] mean value: 0.007857131958007812 key: test_mcc value: [-0.07142857 0.33928571 0.28867513 0.28867513 0.42857143 0. 0.4472136 0.1490712 0. 0. ] mean value: 0.1870063634618141 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.46666667 0.66666667 0.64285714 0.64285714 0.71428571 0.5 0.71428571 0.57142857 0.5 0.5 ] mean value: 0.5919047619047619 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.42857143 0.66666667 0.61538462 0.66666667 0.71428571 0.46153846 0.75 0.5 0.58823529 0.36363636] mean value: 0.5754985210867564 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.42857143 0.71428571 0.66666667 0.625 0.71428571 0.5 0.66666667 0.6 0.5 0.5 ] mean value: 0.591547619047619 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.42857143 0.625 0.57142857 0.71428571 0.71428571 0.42857143 0.85714286 0.42857143 0.71428571 0.28571429] mean value: 0.5767857142857142 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.46428571 0.66964286 0.64285714 0.64285714 0.71428571 0.5 0.71428571 0.57142857 0.5 0.5 ] mean value: 0.5919642857142857 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.27272727 0.5 0.44444444 0.5 0.55555556 0.3 0.6 0.33333333 0.41666667 0.22222222] mean value: 0.41449494949494947 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.24 Accuracy on Blind test: 0.62 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.03270388 1.02527833 1.038306 1.02501392 1.02789044 1.00910234 1.033144 1.01801777 1.01170015 1.01512265] mean value: 1.0236279487609863 key: score_time value: [0.09661889 0.0889287 0.09118915 0.0894289 0.09080982 0.08694053 0.09180784 0.09045506 0.08735704 0.09318399] mean value: 0.09067199230194092 key: test_mcc value: [0.56407607 0.875 1. 0.71428571 0.71428571 1. 1. 0. 0.8660254 0.57735027] mean value: 0.7311023176363259 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73333333 0.93333333 1. 0.85714286 0.85714286 1. 1. 0.5 0.92857143 0.78571429] mean value: 0.8595238095238095 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.77777778 0.93333333 1. 0.85714286 0.85714286 1. 1. 0.53333333 0.93333333 0.8 ] mean value: 0.8692063492063492 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.63636364 1. 1. 0.85714286 0.85714286 1. 1. 0.5 0.875 0.75 ] mean value: 0.847564935064935 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.875 1. 0.85714286 0.85714286 1. 1. 0.57142857 1. 0.85714286] mean value: 0.9017857142857143 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75 0.9375 1. 0.85714286 0.85714286 1. 1. 0.5 0.92857143 0.78571429] mean value: 0.8616071428571429 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.63636364 0.875 1. 0.75 0.75 1. 1. 0.36363636 0.875 0.66666667] mean value: 0.7916666666666666 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.2 Accuracy on Blind test: 0.59 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.85148787 0.9046824 0.85188794 0.83856058 0.89779782 0.83885241 0.90653539 0.82567453 0.8553915 0.97453666] mean value: 0.8745407104492188 key: score_time value: [0.34471321 0.19580126 0.22999787 0.1582098 0.21987033 0.21252227 0.18508577 0.23894954 0.15599632 0.22014427] mean value: 0.21612906455993652 key: test_mcc value: [ 0.56407607 0.875 0.74535599 0.71428571 0.57735027 0.8660254 0.8660254 -0.14285714 0.71428571 0.42857143] mean value: 0.6208118858361914 key: train_mcc value: [0.93745372 0.93748452 0.92288947 0.92288947 0.9379581 0.95417386 0.95324137 0.93933644 0.90802522 0.93933644] mean value: 0.9352788622064109 key: test_accuracy value: [0.73333333 0.93333333 0.85714286 0.85714286 0.78571429 0.92857143 0.92857143 0.42857143 0.85714286 0.71428571] mean value: 0.8023809523809524 key: train_accuracy value: [0.96850394 0.96850394 0.9609375 0.9609375 0.96875 0.9765625 0.9765625 0.96875 0.953125 0.96875 ] mean value: 0.9671382874015748 key: test_fscore value: [0.77777778 0.93333333 0.875 0.85714286 0.76923077 0.92307692 0.92307692 0.42857143 0.85714286 0.71428571] mean value: 0.8058638583638583 key: train_fscore value: [0.96923077 0.96875 0.96183206 0.96183206 0.96923077 0.97709924 0.97674419 0.96969697 0.95454545 0.96969697] mean value: 0.967865847722607 key: test_precision value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [0.63636364 1. 0.77777778 0.85714286 0.83333333 1. 1. 0.42857143 0.85714286 0.71428571] mean value: 0.8104617604617604 key: train_precision value: [0.95454545 0.95384615 0.94029851 0.94029851 0.95454545 0.95522388 0.96923077 0.94117647 0.92647059 0.94117647] mean value: 0.9476812257101985 key: test_recall value: [1. 0.875 1. 0.85714286 0.71428571 0.85714286 0.85714286 0.42857143 0.85714286 0.71428571] mean value: 0.8160714285714286 key: train_recall value: [0.984375 0.98412698 0.984375 0.984375 0.984375 1. 0.984375 1. 0.984375 1. ] mean value: 0.9890376984126984 key: test_roc_auc value: [0.75 0.9375 0.85714286 0.85714286 0.78571429 0.92857143 0.92857143 0.42857143 0.85714286 0.71428571] mean value: 0.8044642857142857 key: train_roc_auc value: [0.96837798 0.96862599 0.9609375 0.9609375 0.96875 0.9765625 0.9765625 0.96875 0.953125 0.96875 ] mean value: 0.9671378968253969 key: test_jcc value: [0.63636364 0.875 0.77777778 0.75 0.625 0.85714286 0.85714286 0.27272727 0.75 0.55555556] mean value: 0.6956709956709957 key: train_jcc value: [0.94029851 0.93939394 0.92647059 0.92647059 0.94029851 0.95522388 0.95454545 0.94117647 0.91304348 0.94117647] mean value: 0.9378097885369711 MCC on Blind test: 0.29 Accuracy on Blind test: 0.63 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01670504 0.00717616 0.00681663 0.00686646 0.00743771 0.00687003 0.00733042 0.00698042 0.00724292 0.00684094] mean value: 0.008026671409606934 key: score_time value: [0.0158987 0.00824714 0.00798559 0.0079875 0.00846529 0.00800943 0.00837207 0.00800943 0.00845408 0.00799131] mean value: 0.008942055702209472 key: test_mcc value: [-0.19642857 0.47245559 0.63245553 -0.14285714 0. 0.4472136 0.4472136 0. 0.28867513 0.1490712 ] mean value: 0.20977989331042105 key: train_mcc value: [0.35590281 0.40535457 0.36154406 0.34995662 0.43771378 0.36480373 0.40704579 0.34391797 0.37665889 0.375 ] mean value: 0.37778982176154485 key: test_accuracy value: [0.4 0.73333333 0.78571429 0.42857143 0.5 0.71428571 0.71428571 0.5 0.64285714 0.57142857] mean value: 0.599047619047619 key: train_accuracy value: [0.67716535 0.7007874 0.6796875 0.671875 0.71875 0.6796875 0.703125 0.671875 0.6875 0.6875 ] mean value: 0.6877952755905512 key: test_fscore value: [0.4 0.77777778 0.82352941 0.42857143 0.53333333 0.66666667 0.75 0.53333333 0.66666667 0.5 ] mean value: 0.6079878618113912 key: train_fscore value: [0.6962963 0.71641791 0.6962963 0.7 0.71428571 0.70503597 0.71212121 0.67692308 0.70149254 0.6875 ] mean value: 0.7006369014906811 key: test_precision value: [0.375 0.7 0.7 0.42857143 0.5 0.8 0.66666667 0.5 0.625 0.6 ] mean value: 0.5895238095238096 key: train_precision value: [0.66197183 0.67605634 0.66197183 0.64473684 0.72580645 0.65333333 0.69117647 0.66666667 0.67142857 0.6875 ] mean value: 0.6740648335734973 key: test_recall value: [0.42857143 0.875 1. 0.42857143 0.57142857 0.57142857 0.85714286 0.57142857 0.71428571 0.42857143] mean value: 0.6446428571428571 key: train_recall value: [0.734375 0.76190476 0.734375 0.765625 0.703125 0.765625 0.734375 0.6875 0.734375 0.6875 ] mean value: 0.7308779761904762 key: test_roc_auc value: [0.40178571 0.72321429 0.78571429 0.42857143 0.5 0.71428571 0.71428571 0.5 0.64285714 0.57142857] mean value: 0.5982142857142857 key: train_roc_auc value: [0.67671131 0.70126488 0.6796875 0.671875 0.71875 0.6796875 0.703125 0.671875 0.6875 0.6875 ] mean value: 0.687797619047619 key: test_jcc value: [0.25 0.63636364 0.7 0.27272727 0.36363636 0.5 0.6 0.36363636 0.5 0.33333333] mean value: 0.45196969696969697 key: train_jcc value: [0.53409091 0.55813953 0.53409091 0.53846154 0.55555556 0.54444444 0.55294118 0.51162791 0.54022989 0.52380952] mean value: 0.5393391383841405 MCC on Blind test: 0.39 Accuracy on Blind test: 0.69 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.06879473 0.03695536 0.03799057 0.03917074 0.03711796 0.03794122 0.04741096 0.0387218 0.04068446 0.03751087] mean value: 0.042229866981506346 key: score_time value: [0.00955296 0.00969815 0.0103898 0.0105443 0.01052046 0.01191258 0.01033878 0.01035762 0.01077509 0.0104599 ] mean value: 0.010454964637756348 key: test_mcc value: [0.66143783 1. 1. 0.8660254 0.71428571 1. 1. 0.57735027 0.8660254 0.71428571] mean value: 0.8399410333096079 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8 1. 1. 0.92857143 0.85714286 1. 1. 0.78571429 0.92857143 0.85714286] mean value: 0.9157142857142857 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.82352941 1. 1. 0.92307692 0.85714286 1. 1. 0.76923077 0.93333333 0.85714286] mean value: 0.9163456151691446 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7 1. 1. 1. 0.85714286 1. 1. 0.83333333 0.875 0.85714286] mean value: 0.9122619047619047 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 0.85714286 0.85714286 1. 1. 0.71428571 1. 0.85714286] mean value: 0.9285714285714286 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 1. 1. 0.92857143 0.85714286 1. 1. 0.78571429 0.92857143 0.85714286] mean value: 0.9169642857142858 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.7 1. 1. 0.85714286 0.75 1. 1. 0.625 0.875 0.75 ] mean value: 0.8557142857142856 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: -0.04 Accuracy on Blind test: 0.48 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01021028 0.01137257 0.01115346 0.01160097 0.01162314 0.01166773 0.01157808 0.01191115 0.01153421 0.01156759] mean value: 0.011421918869018555 key: score_time value: [0.01013207 0.01012731 0.01019645 0.01036048 0.01039767 0.01046324 0.01033401 0.01034856 0.01044607 0.01029301] mean value: 0.010309886932373048 key: test_mcc value: [0.66143783 0.46428571 0.8660254 0.74535599 0.63245553 0.57735027 0.8660254 0.71428571 0.28867513 0.4472136 ] mean value: 0.6263110587724456 key: train_mcc value: [0.93745372 0.90550595 0.92198755 0.9375 0.92198755 0.95324137 0.95417386 0.9379581 0.95324137 0.90669283] mean value: 0.9329742313821212 key: test_accuracy value: [0.8 0.73333333 0.92857143 0.85714286 0.78571429 0.78571429 0.92857143 0.85714286 0.64285714 0.71428571] mean value: 0.8033333333333333 key: train_accuracy value: [0.96850394 0.95275591 0.9609375 0.96875 0.9609375 0.9765625 0.9765625 0.96875 0.9765625 0.953125 ] mean value: 0.9663447342519685 key: test_fscore value: [0.82352941 0.75 0.92307692 0.83333333 0.72727273 0.76923077 0.92307692 0.85714286 0.66666667 0.66666667] mean value: 0.7939996278231571 key: train_fscore value: [0.96923077 0.95238095 0.96062992 0.96875 0.96062992 0.97674419 0.97709924 0.96923077 0.97674419 0.95384615] mean value: 0.9665286095942575 key: test_precision value: [0.7 0.75 1. 1. 1. 0.83333333 1. 0.85714286 0.625 0.8 ] mean value: 0.856547619047619 key: train_precision value: [0.95454545 0.95238095 0.96825397 0.96875 0.96825397 0.96923077 0.95522388 0.95454545 0.96923077 0.93939394] mean value: 0.9599809156432291 key: test_recall value: [1. 0.75 0.85714286 0.71428571 0.57142857 0.71428571 0.85714286 0.85714286 0.71428571 0.57142857] mean value: 0.7607142857142857 key: train_recall value: [0.984375 0.95238095 0.953125 0.96875 0.953125 0.984375 1. 0.984375 0.984375 0.96875 ] mean value: 0.9733630952380953 key: test_roc_auc value: [0.8125 0.73214286 0.92857143 0.85714286 0.78571429 0.78571429 0.92857143 0.85714286 0.64285714 0.71428571] mean value: 0.8044642857142857 key: train_roc_auc value: [0.96837798 0.95275298 0.9609375 0.96875 0.9609375 0.9765625 0.9765625 0.96875 0.9765625 0.953125 ] mean value: 0.9663318452380952 key: test_jcc value: [0.7 0.6 0.85714286 0.71428571 0.57142857 0.625 0.85714286 0.75 0.5 0.5 ] mean value: 0.6675 key: train_jcc value: [0.94029851 0.90909091 0.92424242 0.93939394 0.92424242 0.95454545 0.95522388 0.94029851 0.95454545 0.91176471] mean value: 0.9353646207465347 MCC on Blind test: 0.02 Accuracy on Blind test: 0.51 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01899028 0.00702119 0.00694108 0.00665832 0.00675297 0.00682497 0.00661945 0.00672174 0.00678945 0.00670505] mean value: 0.008002448081970214 key: score_time value: [0.01023316 0.00807333 0.0078764 0.00824809 0.00777531 0.00779223 0.0078361 0.00779271 0.00800109 0.00779223] mean value: 0.00814206600189209 key: test_mcc value: [ 0.56407607 0.6000992 0.40824829 0.14285714 0. 0.4472136 0.28867513 -0.1490712 0. -0.1490712 ] mean value: 0.21530270393825499 key: train_mcc value: [0.40417056 0.34191645 0.4113018 0.47245559 0.39067269 0.34646743 0.29691125 0.43943537 0.4429404 0.39105486] mean value: 0.39373264045213846 key: test_accuracy value: [0.73333333 0.8 0.64285714 0.57142857 0.5 0.71428571 0.64285714 0.42857143 0.5 0.42857143] mean value: 0.5961904761904762 key: train_accuracy value: [0.7007874 0.66929134 0.703125 0.734375 0.6953125 0.671875 0.6484375 0.71875 0.71875 0.6953125 ] mean value: 0.695601624015748 key: test_fscore value: [0.77777778 0.82352941 0.73684211 0.57142857 0.58823529 0.66666667 0.66666667 0.33333333 0.53333333 0.33333333] mean value: 0.6031146493685193 key: train_fscore value: [0.72058824 0.68656716 0.72463768 0.75 0.69767442 0.69117647 0.65116279 0.73134328 0.73913043 0.70229008] mean value: 0.7094570555223779 key: test_precision value: [0.63636364 0.77777778 0.58333333 0.57142857 0.5 0.8 0.625 0.4 0.5 0.4 ] mean value: 0.579390331890332 key: train_precision value: [0.68055556 0.64788732 0.67567568 0.70833333 0.69230769 0.65277778 0.64615385 0.7 0.68918919 0.68656716] mean value: 0.6779447558115836 key: test_recall value: [1. 0.875 1. 0.57142857 0.71428571 0.57142857 0.71428571 0.28571429 0.57142857 0.28571429] mean value: 0.6589285714285714 key: train_recall value: [0.765625 0.73015873 0.78125 0.796875 0.703125 0.734375 0.65625 0.765625 0.796875 0.71875 ] mean value: 0.744890873015873 key: test_roc_auc value: [0.75 0.79464286 0.64285714 0.57142857 0.5 0.71428571 0.64285714 0.42857143 0.5 0.42857143] mean value: 0.5973214285714286 key: train_roc_auc value: [0.70027282 0.66976687 0.703125 0.734375 0.6953125 0.671875 0.6484375 0.71875 0.71875 0.6953125 ] mean value: 0.6955977182539682 key: test_jcc value: [0.63636364 0.7 0.58333333 0.4 0.41666667 0.5 0.5 0.2 0.36363636 0.2 ] mean value: 0.45 key: train_jcc value: [0.56321839 0.52272727 0.56818182 0.6 0.53571429 0.52808989 0.48275862 0.57647059 0.5862069 0.54117647] mean value: 0.5504544231133333 MCC on Blind test: 0.38 Accuracy on Blind test: 0.69 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00755644 0.00730038 0.00778127 0.00782037 0.00711942 0.00752926 0.00717807 0.0077436 0.00730443 0.00738955] mean value: 0.00747227668762207 key: score_time value: [0.00840449 0.00777626 0.00787759 0.00786304 0.00776672 0.00783229 0.00777388 0.00788474 0.00788546 0.00791168] mean value: 0.007897615432739258 key: test_mcc value: [0.56407607 0.60714286 0.57735027 0.40824829 0.40824829 0.1490712 1. 0.28867513 0.57735027 0.31622777] mean value: 0.48963901503792373 key: train_mcc value: [0.69592496 0.84250992 0.92198755 0.72374686 0.60141677 0.78756153 0.62554324 0.84375 0.8226036 0.7617394 ] mean value: 0.7626783838444016 key: test_accuracy value: [0.73333333 0.8 0.78571429 0.64285714 0.64285714 0.57142857 1. 0.64285714 0.78571429 0.64285714] mean value: 0.7247619047619047 key: train_accuracy value: [0.82677165 0.92125984 0.9609375 0.84375 0.765625 0.8828125 0.78125 0.921875 0.90625 0.8671875 ] mean value: 0.8677718996062992 key: test_fscore value: [0.77777778 0.8 0.76923077 0.44444444 0.44444444 0.625 1. 0.61538462 0.76923077 0.70588235] mean value: 0.6951395173453997 key: train_fscore value: [0.85333333 0.92063492 0.96124031 0.81481481 0.69387755 0.8951049 0.82051282 0.921875 0.89830508 0.88275862] mean value: 0.866245735093413 key: test_precision value: [0.63636364 0.85714286 0.83333333 1. 1. 0.55555556 1. 0.66666667 0.83333333 0.6 ] mean value: 0.7982395382395382 key: train_precision value: [0.74418605 0.92063492 0.95384615 1. 1. 0.81012658 0.69565217 0.921875 0.98148148 0.79012346] mean value: 0.8817925815455832 key: test_recall value: [1. 0.75 0.71428571 0.28571429 0.28571429 0.71428571 1. 0.57142857 0.71428571 0.85714286] mean value: 0.6892857142857143 key: train_recall value: [1. 0.92063492 0.96875 0.6875 0.53125 1. 1. 0.921875 0.828125 1. ] mean value: 0.885813492063492 key: test_roc_auc value: [0.75 0.80357143 0.78571429 0.64285714 0.64285714 0.57142857 1. 0.64285714 0.78571429 0.64285714] mean value: 0.7267857142857143 key: train_roc_auc value: [0.82539683 0.92125496 0.9609375 0.84375 0.765625 0.8828125 0.78125 0.921875 0.90625 0.8671875 ] mean value: 0.8676339285714285 key: test_jcc value: [0.63636364 0.66666667 0.625 0.28571429 0.28571429 0.45454545 1. 0.44444444 0.625 0.54545455] mean value: 0.5568903318903319 key: train_jcc value: [0.74418605 0.85294118 0.92537313 0.6875 0.53125 0.81012658 0.69565217 0.85507246 0.81538462 0.79012346] mean value: 0.7707609649444953 MCC on Blind test: 0.28 Accuracy on Blind test: 0.63 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00977015 0.00938821 0.00748348 0.00760007 0.00707746 0.00716782 0.00706244 0.00752711 0.00705504 0.00710464] mean value: 0.0077236413955688475 key: score_time value: [0.01031303 0.00984597 0.00794601 0.00786161 0.00812817 0.00785327 0.00777006 0.008039 0.0077517 0.0078032 ] mean value: 0.008331203460693359 key: test_mcc value: [0.46770717 0.76376262 0.57735027 0.57735027 0.52223297 0.74535599 0.8660254 0.40824829 0.57735027 0.4472136 ] mean value: 0.5952596846856876 key: train_mcc value: [0.70849191 0.58496906 0.8138413 0.8542422 0.63764677 0.60141677 0.81409158 0.72374686 0.72932496 0.90669283] mean value: 0.7374464237869991 key: test_accuracy value: [0.66666667 0.86666667 0.78571429 0.78571429 0.71428571 0.85714286 0.92857143 0.64285714 0.78571429 0.71428571] mean value: 0.7747619047619048 key: train_accuracy value: [0.83464567 0.75590551 0.8984375 0.921875 0.7890625 0.765625 0.90625 0.84375 0.8515625 0.953125 ] mean value: 0.8520238681102362 key: test_fscore value: [0.73684211 0.85714286 0.8 0.76923077 0.6 0.83333333 0.93333333 0.73684211 0.76923077 0.75 ] mean value: 0.7785955272797378 key: train_fscore value: [0.8590604 0.67368421 0.90780142 0.92753623 0.73267327 0.69387755 0.90909091 0.86486486 0.82882883 0.95384615] mean value: 0.8351263838512551 key: test_precision value: [0.58333333 1. 0.75 0.83333333 1. 1. 0.875 0.58333333 0.83333333 0.66666667] mean value: 0.8125 key: train_precision value: [0.75294118 1. 0.83116883 0.86486486 1. 1. 0.88235294 0.76190476 0.9787234 0.93939394] mean value: 0.9011349919234776 key: test_recall value: [1. 0.75 0.85714286 0.71428571 0.42857143 0.71428571 1. 1. 0.71428571 0.85714286] mean value: 0.8035714285714286 key: train_recall value: [1. 0.50793651 1. 1. 0.578125 0.53125 0.9375 1. 0.71875 0.96875 ] mean value: 0.8242311507936508 key: test_roc_auc value: [0.6875 0.875 0.78571429 0.78571429 0.71428571 0.85714286 0.92857143 0.64285714 0.78571429 0.71428571] mean value: 0.7776785714285714 key: train_roc_auc value: [0.83333333 0.75396825 0.8984375 0.921875 0.7890625 0.765625 0.90625 0.84375 0.8515625 0.953125 ] mean value: 0.8516989087301587 key: test_jcc value: [0.58333333 0.75 0.66666667 0.625 0.42857143 0.71428571 0.875 0.58333333 0.625 0.6 ] mean value: 0.6451190476190476 key: train_jcc value: [0.75294118 0.50793651 0.83116883 0.86486486 0.578125 0.53125 0.83333333 0.76190476 0.70769231 0.91176471] mean value: 0.7280981489253548 MCC on Blind test: 0.16 Accuracy on Blind test: 0.57 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.0766685 0.06215739 0.06248546 0.06247211 0.06284809 0.0624218 0.06243992 0.06250739 0.06269693 0.06271696] mean value: 0.06394145488739014 key: score_time value: [0.01422691 0.0138514 0.01398492 0.01401758 0.01408148 0.01418185 0.01409602 0.01414037 0.01412201 0.01409864] mean value: 0.014080119132995606 key: test_mcc value: [0.66143783 1. 0.8660254 1. 0.8660254 0.8660254 1. 0.28867513 0.71428571 0.57735027] mean value: 0.7839825157189617 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8 1. 0.92857143 1. 0.92857143 0.92857143 1. 0.64285714 0.85714286 0.78571429] mean value: 0.8871428571428571 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.82352941 1. 0.92307692 1. 0.93333333 0.93333333 1. 0.66666667 0.85714286 0.76923077] mean value: 0.8906313294548589 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7 1. 1. 1. 0.875 0.875 1. 0.625 0.85714286 0.83333333] mean value: 0.876547619047619 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.85714286 1. 1. 1. 1. 0.71428571 0.85714286 0.71428571] mean value: 0.9142857142857143 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 1. 0.92857143 1. 0.92857143 0.92857143 1. 0.64285714 0.85714286 0.78571429] mean value: 0.8883928571428572 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.7 1. 0.85714286 1. 0.875 0.875 1. 0.5 0.75 0.625 ] mean value: 0.8182142857142857 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: -0.06 Accuracy on Blind test: 0.47 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.02604771 0.02509189 0.03977823 0.03567004 0.04593778 0.0436461 0.04469895 0.03649855 0.02204108 0.02417731] mean value: 0.034358763694763185 key: score_time value: [0.01960158 0.01544213 0.03531432 0.02577138 0.03630829 0.03665662 0.03008604 0.02398133 0.01645422 0.02618575] mean value: 0.026580166816711426 key: test_mcc value: [0.66143783 0.875 1. 0.8660254 0.4472136 0.71428571 1. 0.31622777 0.71428571 0.8660254 ] mean value: 0.7460501425423249 key: train_mcc value: [1. 0.9689752 0.96922337 1. 0.95324137 1. 1. 1. 1. 0.98449518] mean value: 0.9875935124101023 key: test_accuracy value: [0.8 0.93333333 1. 0.92857143 0.71428571 0.85714286 1. 0.64285714 0.85714286 0.92857143] mean value: 0.8661904761904762 key: train_accuracy value: [1. 0.98425197 0.984375 1. 0.9765625 1. 1. 1. 1. 0.9921875 ] mean value: 0.9937376968503937 key: test_fscore value: [0.82352941 0.93333333 1. 0.92307692 0.66666667 0.85714286 1. 0.70588235 0.85714286 0.92307692] mean value: 0.8689851325145442 key: train_fscore value: [1. 0.98387097 0.98412698 1. 0.97637795 1. 1. 1. 1. 0.99212598] mean value: 0.9936501888876793 key: test_precision value: [0.7 1. 1. 1. 0.8 0.85714286 1. 0.6 0.85714286 1. ] mean value: 0.8814285714285715 key: train_precision value: [1. 1. 1. 1. 0.98412698 1. 1. 1. 1. 1. ] mean value: 0.9984126984126984 key: test_recall value: [1. 0.875 1. 0.85714286 0.57142857 0.85714286 1. 0.85714286 0.85714286 0.85714286] mean value: 0.8732142857142857 key: train_recall value: [1. 0.96825397 0.96875 1. 0.96875 1. 1. 1. 1. 0.984375 ] mean value: 0.9890128968253968 key: test_roc_auc value: [0.8125 0.9375 1. 0.92857143 0.71428571 0.85714286 1. 0.64285714 0.85714286 0.92857143] mean value: 0.8678571428571429 key: train_roc_auc value: [1. 0.98412698 0.984375 1. 0.9765625 1. 1. 1. 1. 0.9921875 ] mean value: 0.9937251984126985 key: test_jcc value: [0.7 0.875 1. 0.85714286 0.5 0.75 1. 0.54545455 0.75 0.85714286] mean value: 0.7834740259740259 key: train_jcc value: [1. 0.96825397 0.96875 1. 0.95384615 1. 1. 1. 1. 0.984375 ] mean value: 0.9875225122100122 MCC on Blind test: 0.07 Accuracy on Blind test: 0.53 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.029562 0.03599644 0.0356493 0.03583241 0.035743 0.01903772 0.02019906 0.02188826 0.0147891 0.04324818] mean value: 0.02919454574584961 key: score_time value: [0.02010798 0.01939225 0.01914334 0.01897407 0.01091695 0.01093078 0.01089573 0.01088786 0.01081705 0.01087356] mean value: 0.014293956756591796 key: test_mcc value: [ 0.60714286 0.46428571 0.14285714 0.28867513 -0.1490712 0.4472136 0.28867513 0.14285714 0. 0. ] mean value: 0.22326355233324546 key: train_mcc value: [0.95287698 0.96850198 0.95324137 0.96922337 0.9379581 0.98449518 0.92198755 0.95417386 0.95417386 0.96875 ] mean value: 0.9565382271774051 key: test_accuracy value: [0.8 0.73333333 0.57142857 0.64285714 0.42857143 0.71428571 0.64285714 0.57142857 0.5 0.5 ] mean value: 0.6104761904761905 key: train_accuracy value: [0.97637795 0.98425197 0.9765625 0.984375 0.96875 0.9921875 0.9609375 0.9765625 0.9765625 0.984375 ] mean value: 0.9780942421259843 key: test_fscore value: [0.8 0.75 0.57142857 0.61538462 0.33333333 0.66666667 0.61538462 0.57142857 0.46153846 0.36363636] mean value: 0.5748801198801199 key: train_fscore value: [0.97637795 0.98412698 0.97637795 0.98412698 0.96825397 0.99224806 0.96062992 0.976 0.976 0.984375 ] mean value: 0.9778516825295094 key: test_precision value: [0.75 0.75 0.57142857 0.66666667 0.4 0.8 0.66666667 0.57142857 0.5 0.5 ] mean value: 0.6176190476190476 key: train_precision value: [0.98412698 0.98412698 0.98412698 1. 0.98387097 0.98461538 0.96825397 1. 1. 0.984375 ] mean value: 0.987349627299224 key: test_recall value: [0.85714286 0.75 0.57142857 0.57142857 0.28571429 0.57142857 0.57142857 0.57142857 0.42857143 0.28571429] mean value: 0.5464285714285714 key: train_recall value: [0.96875 0.98412698 0.96875 0.96875 0.953125 1. 0.953125 0.953125 0.953125 0.984375 ] mean value: 0.9687251984126984 key: test_roc_auc value: [0.80357143 0.73214286 0.57142857 0.64285714 0.42857143 0.71428571 0.64285714 0.57142857 0.5 0.5 ] mean value: 0.6107142857142857 key: train_roc_auc value: [0.97643849 0.98425099 0.9765625 0.984375 0.96875 0.9921875 0.9609375 0.9765625 0.9765625 0.984375 ] mean value: 0.9781001984126985 key: test_jcc value: [0.66666667 0.6 0.4 0.44444444 0.2 0.5 0.44444444 0.4 0.3 0.22222222] mean value: 0.41777777777777775 key: train_jcc value: [0.95384615 0.96875 0.95384615 0.96875 0.93846154 0.98461538 0.92424242 0.953125 0.953125 0.96923077] mean value: 0.9567992424242424 MCC on Blind test: 0.29 Accuracy on Blind test: 0.64 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.1164434 0.10044718 0.10379887 0.09790683 0.10323715 0.10379982 0.10525608 0.10259223 0.1029501 0.10186577] mean value: 0.10382974147796631 key: score_time value: [0.00966978 0.00841308 0.00894523 0.00920153 0.00936794 0.00911546 0.00893807 0.00919414 0.00939775 0.00913119] mean value: 0.009137415885925293 key: test_mcc value: [0.66143783 1. 1. 0.8660254 0.8660254 0.8660254 1. 0.4472136 0.8660254 0.42857143] mean value: 0.8001324466975289 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8 1. 1. 0.92857143 0.92857143 0.92857143 1. 0.71428571 0.92857143 0.71428571] mean value: 0.8942857142857144 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.82352941 1. 1. 0.92307692 0.93333333 0.93333333 1. 0.75 0.93333333 0.71428571] mean value: 0.9010892049127344 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7 1. 1. 1. 0.875 0.875 1. 0.66666667 0.875 0.71428571] mean value: 0.8705952380952381 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 0.85714286 1. 1. 1. 0.85714286 1. 0.71428571] mean value: 0.9428571428571428 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 1. 1. 0.92857143 0.92857143 0.92857143 1. 0.71428571 0.92857143 0.71428571] mean value: 0.8955357142857143 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.7 1. 1. 0.85714286 0.875 0.875 1. 0.6 0.875 0.55555556] mean value: 0.8337698412698412 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.04 Accuracy on Blind test: 0.51 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.00956202 0.01067328 0.01087284 0.01171899 0.0111239 0.01345611 0.01117897 0.01136255 0.011199 0.01894307] mean value: 0.012009072303771972 key: score_time value: [0.01031303 0.01022744 0.01023555 0.01084542 0.01065707 0.01072264 0.01063037 0.01066041 0.0107913 0.01101661] mean value: 0.010609984397888184 key: test_mcc value: [0.66143783 0.18898224 0.40824829 0.52223297 0.4472136 0.28867513 0.52223297 0.17407766 0.2773501 0.17407766] mean value: 0.36645284305875925 key: train_mcc value: [0.70849191 0.59989919 0.64978629 0.57735027 0.76571848 0.73658951 0.58937969 0.7617394 0.71641857 0.71125407] mean value: 0.6816627369382001 key: test_accuracy value: [0.8 0.6 0.64285714 0.71428571 0.71428571 0.64285714 0.71428571 0.57142857 0.57142857 0.57142857] mean value: 0.6542857142857142 key: train_accuracy value: [0.83464567 0.76377953 0.796875 0.75 0.8828125 0.859375 0.7578125 0.8671875 0.84375 0.8359375 ] mean value: 0.8192175196850393 key: test_fscore value: [0.82352941 0.66666667 0.73684211 0.77777778 0.66666667 0.61538462 0.77777778 0.66666667 0.7 0.66666667] mean value: 0.7097978354634701 key: train_fscore value: [0.8590604 0.80769231 0.83116883 0.8 0.88188976 0.87323944 0.80503145 0.88275862 0.8630137 0.8590604 ] mean value: 0.8462914910490185 key: test_precision value: [0.7 0.6 0.58333333 0.63636364 0.8 0.66666667 0.63636364 0.54545455 0.53846154 0.54545455] mean value: 0.6252097902097902 key: train_precision value: [0.75294118 0.67741935 0.71111111 0.66666667 0.88888889 0.79487179 0.67368421 0.79012346 0.76829268 0.75294118] mean value: 0.7476940519561616 key: test_recall value: [1. 0.75 1. 1. 0.57142857 0.57142857 1. 0.85714286 1. 0.85714286] mean value: 0.8607142857142857 key: train_recall value: [1. 1. 1. 1. 0.875 0.96875 1. 1. 0.984375 1. ] mean value: 0.9828125 key: test_roc_auc value: [0.8125 0.58928571 0.64285714 0.71428571 0.71428571 0.64285714 0.71428571 0.57142857 0.57142857 0.57142857] mean value: 0.6544642857142857 key: train_roc_auc value: [0.83333333 0.765625 0.796875 0.75 0.8828125 0.859375 0.7578125 0.8671875 0.84375 0.8359375 ] mean value: 0.8192708333333334 key: test_jcc value: [0.7 0.5 0.58333333 0.63636364 0.5 0.44444444 0.63636364 0.5 0.53846154 0.5 ] mean value: 0.5538966588966588 key: train_jcc value: [0.75294118 0.67741935 0.71111111 0.66666667 0.78873239 0.775 0.67368421 0.79012346 0.75903614 0.75294118] mean value: 0.7347655691818613 MCC on Blind test: 0.27 Accuracy on Blind test: 0.64 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.01034141 0.01005673 0.00851965 0.00838447 0.00835919 0.00818896 0.00820589 0.00816202 0.00818706 0.00815248] mean value: 0.008655786514282227 key: score_time value: [0.01045799 0.00901937 0.00879979 0.00867701 0.00855756 0.00852752 0.00862861 0.00858521 0.00861907 0.00860906] mean value: 0.008848118782043456 key: test_mcc value: [0.66143783 0.76376262 0.8660254 0.63245553 0.74535599 0.74535599 1. 0.42857143 0.74535599 0.42857143] mean value: 0.7016892214052882 key: train_mcc value: [0.87447286 0.88988095 0.85947992 0.875 0.87542756 0.90669283 0.87542756 0.84375 0.89073374 0.89073374] mean value: 0.8781599163560809 key: test_accuracy value: [0.8 0.86666667 0.92857143 0.78571429 0.85714286 0.85714286 1. 0.71428571 0.85714286 0.71428571] mean value: 0.8380952380952381 key: train_accuracy value: [0.93700787 0.94488189 0.9296875 0.9375 0.9375 0.953125 0.9375 0.921875 0.9453125 0.9453125 ] mean value: 0.9389702263779528 key: test_fscore value: [0.82352941 0.85714286 0.92307692 0.72727273 0.83333333 0.83333333 1. 0.71428571 0.875 0.71428571] mean value: 0.8301260014495309 key: train_fscore value: [0.93650794 0.94488189 0.92913386 0.9375 0.93846154 0.95384615 0.93846154 0.921875 0.94573643 0.94573643] mean value: 0.9392140783525718 key: test_precision value: [0.7 1. 1. 1. 1. 1. 1. 0.71428571 0.77777778 0.71428571] mean value: 0.8906349206349207 key: train_precision value: [0.9516129 0.9375 0.93650794 0.9375 0.92424242 0.93939394 0.92424242 0.921875 0.93846154 0.93846154] mean value: 0.9349797704535607 key: test_recall value: [1. 0.75 0.85714286 0.57142857 0.71428571 0.71428571 1. 0.71428571 1. 0.71428571] mean value: 0.8035714285714286 key: train_recall value: [0.921875 0.95238095 0.921875 0.9375 0.953125 0.96875 0.953125 0.921875 0.953125 0.953125 ] mean value: 0.9436755952380952 key: test_roc_auc value: [0.8125 0.875 0.92857143 0.78571429 0.85714286 0.85714286 1. 0.71428571 0.85714286 0.71428571] mean value: 0.8401785714285714 key: train_roc_auc value: [0.93712798 0.94494048 0.9296875 0.9375 0.9375 0.953125 0.9375 0.921875 0.9453125 0.9453125 ] mean value: 0.9389880952380952 key: test_jcc value: [0.7 0.75 0.85714286 0.57142857 0.71428571 0.71428571 1. 0.55555556 0.77777778 0.55555556] mean value: 0.7196031746031746 key: train_jcc value: [0.88059701 0.89552239 0.86764706 0.88235294 0.88405797 0.91176471 0.88405797 0.85507246 0.89705882 0.89705882] mean value: 0.8855190161723353 MCC on Blind test: 0.21 Accuracy on Blind test: 0.6 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.07449079 0.06415701 0.06411934 0.06436348 0.06481528 0.06425691 0.06450534 0.06399703 0.0644269 0.06456614] mean value: 0.0653698205947876 key: score_time value: [0.00915241 0.00884914 0.0087738 0.00880289 0.00888062 0.00877213 0.00884771 0.00880075 0.00886726 0.00878549] mean value: 0.00885322093963623 key: test_mcc value: [0.66143783 0.76376262 0.8660254 0.63245553 0.63245553 0.74535599 1. 0.42857143 0.74535599 0.42857143] mean value: 0.6903991753586628 key: train_mcc value: [0.87447286 0.88988095 0.85947992 0.875 0.92198755 0.95417386 0.87542756 0.84375 0.89073374 0.89073374] mean value: 0.887564019240932 key: test_accuracy value: [0.8 0.86666667 0.92857143 0.78571429 0.78571429 0.85714286 1. 0.71428571 0.85714286 0.71428571] mean value: 0.830952380952381 key: train_accuracy value: [0.93700787 0.94488189 0.9296875 0.9375 0.9609375 0.9765625 0.9375 0.921875 0.9453125 0.9453125 ] mean value: 0.9436577263779528 key: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:183: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:186: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) test_fscore value: [0.82352941 0.85714286 0.92307692 0.72727273 0.72727273 0.83333333 1. 0.71428571 0.875 0.71428571] mean value: 0.8195199408434702 key: train_fscore value: [0.93650794 0.94488189 0.92913386 0.9375 0.96124031 0.97709924 0.93846154 0.921875 0.94573643 0.94573643] mean value: 0.9438172637936766 key: test_precision value: [0.7 1. 1. 1. 1. 1. 1. 0.71428571 0.77777778 0.71428571] mean value: 0.8906349206349207 key: train_precision value: [0.9516129 0.9375 0.93650794 0.9375 0.95384615 0.95522388 0.92424242 0.921875 0.93846154 0.93846154] mean value: 0.9395231375342413 key: test_recall value: [1. 0.75 0.85714286 0.57142857 0.57142857 0.71428571 1. 0.71428571 1. 0.71428571] mean value: 0.7892857142857143 key: train_recall value: [0.921875 0.95238095 0.921875 0.9375 0.96875 1. 0.953125 0.921875 0.953125 0.953125 ] mean value: 0.9483630952380953 key: test_roc_auc value: [0.8125 0.875 0.92857143 0.78571429 0.78571429 0.85714286 1. 0.71428571 0.85714286 0.71428571] mean value: 0.8330357142857143 key: train_roc_auc value: [0.93712798 0.94494048 0.9296875 0.9375 0.9609375 0.9765625 0.9375 0.921875 0.9453125 0.9453125 ] mean value: 0.9436755952380952 key: test_jcc value: [0.7 0.75 0.85714286 0.57142857 0.57142857 0.71428571 1. 0.55555556 0.77777778 0.55555556] mean value: 0.7053174603174603 key: train_jcc value: [0.88059701 0.89552239 0.86764706 0.88235294 0.92537313 0.95522388 0.88405797 0.85507246 0.89705882 0.89705882] mean value: 0.893996449975188 MCC on Blind test: 0.08 Accuracy on Blind test: 0.54 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.02492595 0.01881099 0.02069116 0.02081108 0.0185163 0.02902484 0.02047849 0.03224087 0.03128767 0.02006626] mean value: 0.023685359954833986 key: score_time value: [0.0105257 0.0105207 0.01081491 0.01050901 0.01055932 0.01063681 0.01051402 0.01091695 0.01072741 0.010885 ] mean value: 0.010660982131958008 key: test_mcc value: [0.48075018 0.56818182 0.56490196 0.47727273 0.91605722 0.74242424 0.83743579 0.82575758 0.91287093 0.54772256] mean value: 0.6873374995187688 key: train_mcc value: [0.82452636 0.74645342 0.7859188 0.7954287 0.73693234 0.78548989 0.75613935 0.78536075 0.76756932 0.79615403] mean value: 0.777997295379433 key: test_accuracy value: [0.73913043 0.7826087 0.7826087 0.73913043 0.95652174 0.86956522 0.91304348 0.91304348 0.95454545 0.77272727] mean value: 0.8422924901185771 key: train_accuracy value: [0.91219512 0.87317073 0.89268293 0.89756098 0.86829268 0.89268293 0.87804878 0.89268293 0.88349515 0.89805825] mean value: 0.8888870471228985 key: test_fscore value: [0.7 0.7826087 0.76190476 0.72727273 0.96 0.86956522 0.92307692 0.91666667 0.95652174 0.76190476] mean value: 0.8359521492999754 key: train_fscore value: [0.91346154 0.875 0.8952381 0.89952153 0.86956522 0.89108911 0.87804878 0.89215686 0.88571429 0.89855072] mean value: 0.8898346144687178 key: test_precision value: [0.77777778 0.75 0.8 0.72727273 0.92307692 0.90909091 0.85714286 0.91666667 0.91666667 0.8 ] mean value: 0.8377694527694528 key: train_precision value: [0.9047619 0.86666667 0.87850467 0.88679245 0.85714286 0.9 0.87378641 0.89215686 0.86915888 0.89423077] mean value: 0.8823201472546344 key: test_recall value: [0.63636364 0.81818182 0.72727273 0.72727273 1. 0.83333333 1. 0.91666667 1. 0.72727273] mean value: 0.8386363636363636 key: train_recall value: [0.9223301 0.88349515 0.91262136 0.91262136 0.88235294 0.88235294 0.88235294 0.89215686 0.90291262 0.90291262] mean value: 0.8976108890158006 key: test_roc_auc value: [0.73484848 0.78409091 0.78030303 0.73863636 0.95454545 0.87121212 0.90909091 0.91287879 0.95454545 0.77272727] mean value: 0.8412878787878788 key: train_roc_auc value: [0.91214544 0.87312012 0.89258519 0.89748715 0.86836094 0.89263278 0.87806967 0.89268037 0.88349515 0.89805825] mean value: 0.8888635065676757 key: test_jcc value: [0.53846154 0.64285714 0.61538462 0.57142857 0.92307692 0.76923077 0.85714286 0.84615385 0.91666667 0.61538462] mean value: 0.7295787545787545 key: train_jcc value: [0.84070796 0.77777778 0.81034483 0.8173913 0.76923077 0.80357143 0.7826087 0.80530973 0.79487179 0.81578947] mean value: 0.8017603770837232 MCC on Blind test: 0.35 Accuracy on Blind test: 0.67 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.64007044 0.59848499 0.60545135 0.81422853 0.62158179 0.65589404 0.78231335 0.678689 0.63352537 0.76259565] mean value: 0.6792834520339965 key: score_time value: [0.01374793 0.01383209 0.01083827 0.01381516 0.01120448 0.01409101 0.01422381 0.01409006 0.01428652 0.01422524] mean value: 0.013435459136962891 key: test_mcc value: [0.65909298 0.74242424 0.74047959 0.56818182 0.83971912 0.82575758 0.65151515 0.74242424 0.68313005 0.73029674] mean value: 0.7183021519902297 key: train_mcc value: [0.90310636 1. 0.88308106 1. 0.88292404 0.88361919 0.86358877 0.94146202 0.99033794 0.89358299] mean value: 0.9241702374683562 key: test_accuracy value: [0.82608696 0.86956522 0.86956522 0.7826087 0.91304348 0.91304348 0.82608696 0.86956522 0.81818182 0.86363636] mean value: 0.8551383399209486 key: train_accuracy value: [0.95121951 1. 0.94146341 1. 0.94146341 0.94146341 0.93170732 0.97073171 0.99514563 0.94660194] mean value: 0.9619796353303338 key: test_fscore value: [0.8 0.86956522 0.85714286 0.7826087 0.90909091 0.91666667 0.83333333 0.86956522 0.77777778 0.86956522] mean value: 0.848531589183763 key: train_fscore value: [0.95238095 1. 0.94230769 1. 0.94117647 0.94230769 0.93203883 0.97058824 0.99512195 0.94736842] mean value: 0.962329025010229 key: test_precision value: [0.88888889 0.83333333 0.9 0.75 1. 0.91666667 0.83333333 0.90909091 1. 0.83333333] mean value: 0.8864646464646465 key: train_precision value: [0.93457944 1. 0.93333333 1. 0.94117647 0.9245283 0.92307692 0.97058824 1. 0.93396226] mean value: 0.9561244967582682 key: test_recall value: [0.72727273 0.90909091 0.81818182 0.81818182 0.83333333 0.91666667 0.83333333 0.83333333 0.63636364 0.90909091] mean value: 0.8234848484848485 key: train_recall value: [0.97087379 1. 0.95145631 1. 0.94117647 0.96078431 0.94117647 0.97058824 0.99029126 0.96116505] mean value: 0.9687511897963069 key: test_roc_auc value: [0.8219697 0.87121212 0.86742424 0.78409091 0.91666667 0.91287879 0.82575758 0.87121212 0.81818182 0.86363636] mean value: 0.8553030303030303 key: train_roc_auc value: [0.95112317 1. 0.94141443 1. 0.94146202 0.94155721 0.93175328 0.97073101 0.99514563 0.94660194] mean value: 0.96197886921759 key: test_jcc value: [0.66666667 0.76923077 0.75 0.64285714 0.83333333 0.84615385 0.71428571 0.76923077 0.63636364 0.76923077] mean value: 0.7397352647352647 key: train_jcc value: [0.90909091 1. 0.89090909 1. 0.88888889 0.89090909 0.87272727 0.94285714 0.99029126 0.9 ] mean value: 0.9285673657518317 MCC on Blind test: 0.2 Accuracy on Blind test: 0.59 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.00964332 0.00924993 0.00782824 0.00768685 0.00750494 0.00749516 0.007514 0.00766039 0.00757623 0.00769448] mean value: 0.007985353469848633 key: score_time value: [0.01074767 0.00942302 0.00893474 0.0085392 0.00857997 0.00853729 0.00858474 0.00858116 0.0085609 0.00856519] mean value: 0.008905386924743653 key: test_mcc value: [0.44411739 0.50460839 0.2096648 0.23262105 0.40451992 0.65909298 0.47923384 0.62050523 0.39735971 0.20412415] mean value: 0.4155847461790167 key: train_mcc value: [0.39137259 0.44043936 0.45968386 0.4798642 0.4267072 0.43504485 0.45392287 0.42888555 0.44151079 0.46358632] mean value: 0.4421017579498864 key: test_accuracy value: [0.69565217 0.69565217 0.56521739 0.60869565 0.65217391 0.82608696 0.69565217 0.7826087 0.63636364 0.59090909] mean value: 0.674901185770751 key: train_accuracy value: [0.63414634 0.68780488 0.70243902 0.70731707 0.67804878 0.68292683 0.69756098 0.68292683 0.69417476 0.7038835 ] mean value: 0.6871228984134502 key: test_fscore value: [0.74074074 0.75862069 0.66666667 0.64 0.75 0.84615385 0.77419355 0.82758621 0.73333333 0.66666667] mean value: 0.7403961698500074 key: train_fscore value: [0.73309609 0.75384615 0.76078431 0.76744186 0.74615385 0.74903475 0.75590551 0.74708171 0.75294118 0.76078431] mean value: 0.7527069722703967 key: test_precision value: [0.625 0.61111111 0.52631579 0.57142857 0.6 0.78571429 0.63157895 0.70588235 0.57894737 0.5625 ] mean value: 0.6198478426458303 key: train_precision value: [0.57865169 0.62420382 0.63815789 0.63870968 0.61392405 0.61783439 0.63157895 0.61935484 0.63157895 0.63815789] mean value: 0.6232152152926238 key: test_recall value: [0.90909091 1. 0.90909091 0.72727273 1. 0.91666667 1. 1. 1. 0.81818182] mean value: 0.928030303030303 key: train_recall value: [1. 0.95145631 0.94174757 0.96116505 0.95098039 0.95098039 0.94117647 0.94117647 0.93203883 0.94174757] mean value: 0.9512469065296021 key: test_roc_auc value: [0.70454545 0.70833333 0.57954545 0.61363636 0.63636364 0.8219697 0.68181818 0.77272727 0.63636364 0.59090909] mean value: 0.6746212121212122 key: train_roc_auc value: [0.63235294 0.68651247 0.70126594 0.70607272 0.67937369 0.68422806 0.69874358 0.68418047 0.69417476 0.7038835 ] mean value: 0.6870788121073672 key: test_jcc value: [0.58823529 0.61111111 0.5 0.47058824 0.6 0.73333333 0.63157895 0.70588235 0.57894737 0.5 ] mean value: 0.591967664258686 key: train_jcc value: [0.57865169 0.60493827 0.61392405 0.62264151 0.59509202 0.59876543 0.60759494 0.59627329 0.60377358 0.61392405] mean value: 0.6035578837876612 MCC on Blind test: 0.48 Accuracy on Blind test: 0.71 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00798321 0.00766301 0.00776434 0.00779152 0.0078249 0.00776362 0.0078249 0.00781822 0.00786686 0.00770688] mean value: 0.007800745964050293 key: score_time value: [0.00858808 0.00862932 0.00854731 0.00855184 0.00859928 0.00858569 0.00869465 0.0087173 0.0086596 0.00857878] mean value: 0.00861518383026123 key: test_mcc value: [ 0.3030303 0.15096491 -0.03816905 0.3030303 0.39727608 0.56818182 0.39727608 0.31252706 0.29277002 0.09245003] mean value: 0.27793375505710965 key: train_mcc value: [0.37046449 0.38910743 0.39476736 0.38236392 0.38354703 0.35891522 0.35302365 0.36367161 0.37290762 0.39345795] mean value: 0.37622262820367175 key: test_accuracy value: [0.65217391 0.56521739 0.47826087 0.65217391 0.69565217 0.7826087 0.69565217 0.65217391 0.63636364 0.54545455] mean value: 0.6355731225296443 key: train_accuracy value: [0.68292683 0.69268293 0.69268293 0.68780488 0.68780488 0.67804878 0.67317073 0.67804878 0.68446602 0.68932039] mean value: 0.6846957139474308 key: test_fscore value: [0.63636364 0.61538462 0.5 0.63636364 0.74074074 0.7826087 0.74074074 0.71428571 0.69230769 0.58333333] mean value: 0.6642128805172284 key: train_fscore value: [0.70852018 0.71493213 0.72489083 0.71681416 0.71428571 0.69444444 0.69955157 0.70535714 0.70588235 0.72649573] mean value: 0.7111174245586319 key: test_precision value: [0.63636364 0.53333333 0.46153846 0.63636364 0.66666667 0.81818182 0.66666667 0.625 0.6 0.53846154] mean value: 0.6182575757575758 key: train_precision value: [0.65833333 0.66949153 0.65873016 0.65853659 0.6557377 0.65789474 0.6446281 0.64754098 0.66101695 0.64885496] mean value: 0.6560765038377927 key: test_recall value: [0.63636364 0.72727273 0.54545455 0.63636364 0.83333333 0.75 0.83333333 0.83333333 0.81818182 0.63636364] mean value: 0.725 key: train_recall value: [0.76699029 0.76699029 0.80582524 0.78640777 0.78431373 0.73529412 0.76470588 0.7745098 0.75728155 0.82524272] mean value: 0.7767561393489435 key: test_roc_auc value: [0.65151515 0.5719697 0.48106061 0.65151515 0.68939394 0.78409091 0.68939394 0.64393939 0.63636364 0.54545455] mean value: 0.634469696969697 key: train_roc_auc value: [0.68251475 0.69231868 0.69212831 0.68732153 0.68827337 0.67832667 0.67361508 0.67851704 0.68446602 0.68932039] mean value: 0.6846801827527127 key: test_jcc value: [0.46666667 0.44444444 0.33333333 0.46666667 0.58823529 0.64285714 0.58823529 0.55555556 0.52941176 0.41176471] mean value: 0.5027170868347339 key: train_jcc value: [0.54861111 0.55633803 0.56849315 0.55862069 0.55555556 0.53191489 0.53793103 0.54482759 0.54545455 0.5704698 ] mean value: 0.5518216393594725 MCC on Blind test: 0.47 Accuracy on Blind test: 0.73 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00768375 0.00718713 0.0075891 0.00747037 0.00748372 0.00737071 0.00746584 0.00747824 0.00745106 0.00744772] mean value: 0.007462763786315918 key: score_time value: [0.00988078 0.00973558 0.00988078 0.00987172 0.01486492 0.00991106 0.00977206 0.00987005 0.00975847 0.00983 ] mean value: 0.010337543487548829 key: test_mcc value: [-0.05427825 0.13740858 0.56818182 0.56490196 0.31252706 0.58930667 0.31298622 0.74242424 0.2773501 0.09245003] mean value: 0.35432584250263544 key: train_mcc value: [0.66217798 0.6392382 0.62934402 0.59038553 0.67133261 0.6310448 0.68889027 0.65854355 0.59234469 0.68222103] mean value: 0.6445522695261332 key: test_accuracy value: [0.47826087 0.56521739 0.7826087 0.7826087 0.65217391 0.7826087 0.65217391 0.86956522 0.63636364 0.54545455] mean value: 0.674703557312253 key: train_accuracy value: [0.82926829 0.8195122 0.81463415 0.79512195 0.83414634 0.81463415 0.84390244 0.82926829 0.7961165 0.83980583] mean value: 0.8216410134975136 key: test_fscore value: [0.4 0.58333333 0.7826087 0.76190476 0.71428571 0.76190476 0.63636364 0.86956522 0.6 0.5 ] mean value: 0.6609966120835686 key: train_fscore value: [0.83870968 0.82296651 0.81730769 0.79411765 0.82474227 0.80612245 0.83838384 0.82758621 0.79411765 0.84651163] mean value: 0.8210565561229923 key: test_precision value: [0.44444444 0.53846154 0.75 0.8 0.625 0.88888889 0.7 0.90909091 0.66666667 0.55555556] mean value: 0.6878108003108003 key: train_precision value: [0.79824561 0.81132075 0.80952381 0.8019802 0.86956522 0.84042553 0.86458333 0.83168317 0.8019802 0.8125 ] mean value: 0.8241807825271845 key: test_recall value: [0.36363636 0.63636364 0.81818182 0.72727273 0.83333333 0.66666667 0.58333333 0.83333333 0.54545455 0.45454545] mean value: 0.6462121212121212 key: train_recall value: [0.88349515 0.83495146 0.82524272 0.78640777 0.78431373 0.7745098 0.81372549 0.82352941 0.78640777 0.88349515] mean value: 0.8196078431372549 key: test_roc_auc value: [0.47348485 0.56818182 0.78409091 0.78030303 0.64393939 0.78787879 0.65530303 0.87121212 0.63636364 0.54545455] mean value: 0.6746212121212121 key: train_roc_auc value: [0.82900247 0.81943651 0.81458214 0.79516467 0.83390444 0.81443937 0.84375595 0.82924043 0.7961165 0.83980583] mean value: 0.8215448315248429 key: test_jcc value: [0.25 0.41176471 0.64285714 0.61538462 0.55555556 0.61538462 0.46666667 0.76923077 0.42857143 0.33333333] mean value: 0.508874883286648 key: train_jcc value: [0.72222222 0.69918699 0.69105691 0.65853659 0.70175439 0.67521368 0.72173913 0.70588235 0.65853659 0.73387097] mean value: 0.6967999807689436 MCC on Blind test: 0.3 Accuracy on Blind test: 0.65 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01029658 0.01014757 0.01019192 0.01032376 0.01028013 0.01028109 0.01010323 0.01022696 0.01013231 0.01031971] mean value: 0.010230326652526855 key: score_time value: [0.00922775 0.00909877 0.0091269 0.00924611 0.00912356 0.00912118 0.00901008 0.00927448 0.00904131 0.00910091] mean value: 0.009137105941772462 key: test_mcc value: [0.56490196 0.65151515 0.38932432 0.38932432 0.42228828 0.66414149 0.65909298 0.82575758 0.64715023 0.46225016] mean value: 0.5675746466667577 key: train_mcc value: [0.80487341 0.72698715 0.75693529 0.76584809 0.67808871 0.71711403 0.70747264 0.72814868 0.73789886 0.74884444] mean value: 0.7372211285541519 key: test_accuracy value: [0.7826087 0.82608696 0.69565217 0.69565217 0.69565217 0.82608696 0.82608696 0.91304348 0.81818182 0.72727273] mean value: 0.7806324110671937 key: train_accuracy value: [0.90243902 0.86341463 0.87804878 0.88292683 0.83902439 0.85853659 0.85365854 0.86341463 0.86893204 0.87378641] mean value: 0.8684181861236088 key: test_fscore value: [0.76190476 0.81818182 0.66666667 0.66666667 0.75862069 0.81818182 0.84615385 0.91666667 0.8 0.7 ] mean value: 0.7753042934077417 key: train_fscore value: [0.90291262 0.8627451 0.88151659 0.88349515 0.83902439 0.85853659 0.85436893 0.86666667 0.86956522 0.87735849] mean value: 0.8696189734979832 key: test_precision value: [0.8 0.81818182 0.7 0.7 0.64705882 0.9 0.78571429 0.91666667 0.88888889 0.77777778] mean value: 0.7934288260758849 key: train_precision value: [0.90291262 0.87128713 0.86111111 0.88349515 0.83495146 0.85436893 0.84615385 0.84259259 0.86538462 0.85321101] mean value: 0.8615468458469154 key: test_recall value: [0.72727273 0.81818182 0.63636364 0.63636364 0.91666667 0.75 0.91666667 0.91666667 0.72727273 0.63636364] mean value: 0.7681818181818182 key: train_recall value: [0.90291262 0.85436893 0.90291262 0.88349515 0.84313725 0.8627451 0.8627451 0.89215686 0.87378641 0.90291262] mean value: 0.8781172663240053 key: test_roc_auc value: [0.78030303 0.82575758 0.69318182 0.69318182 0.68560606 0.82954545 0.8219697 0.91287879 0.81818182 0.72727273] mean value: 0.7787878787878787 key: train_roc_auc value: [0.9024367 0.86345898 0.8779269 0.88292404 0.83904436 0.85855702 0.85370265 0.86355416 0.86893204 0.87378641] mean value: 0.8684323243860652 key: test_jcc value: [0.61538462 0.69230769 0.5 0.5 0.61111111 0.69230769 0.73333333 0.84615385 0.66666667 0.53846154] mean value: 0.6395726495726496 key: train_jcc value: [0.82300885 0.75862069 0.78813559 0.79130435 0.72268908 0.75213675 0.74576271 0.76470588 0.76923077 0.78151261] mean value: 0.7697107276516258 MCC on Blind test: 0.45 Accuracy on Blind test: 0.72 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.3924427 0.61908412 0.74449182 0.91650462 0.63582444 0.65240908 0.77556181 0.61309528 0.7518034 0.7737174 ] mean value: 0.6874934673309326 key: score_time value: [0.0111208 0.01100135 0.01523304 0.01314092 0.01097083 0.01089597 0.01093626 0.01093769 0.01482415 0.01095867] mean value: 0.012001967430114746 key: test_mcc value: [0.31252706 0.58930667 0.69084928 0.56818182 0.65909298 0.74047959 0.83743579 0.82575758 0.81818182 0.46225016] mean value: 0.6504062738835646 key: train_mcc value: [0.7606076 0.7863314 0.86610349 0.91330072 0.82498132 0.87660499 0.79068188 0.88440807 0.88499797 0.91300871] mean value: 0.8501026166060419 key: test_accuracy value: [0.65217391 0.7826087 0.82608696 0.7826087 0.82608696 0.86956522 0.91304348 0.91304348 0.90909091 0.72727273] mean value: 0.8201581027667985 key: train_accuracy value: [0.87804878 0.88780488 0.93170732 0.95609756 0.91219512 0.93658537 0.89268293 0.94146341 0.94174757 0.95631068] mean value: 0.9234643618280843 key: test_fscore value: [0.55555556 0.8 0.77777778 0.7826087 0.84615385 0.88 0.92307692 0.91666667 0.90909091 0.7 ] mean value: 0.8090930373973853 key: train_fscore value: [0.87179487 0.89686099 0.92929293 0.95522388 0.91 0.93896714 0.88541667 0.93939394 0.94339623 0.9569378 ] mean value: 0.9227284435900899 key: test_precision value: [0.71428571 0.71428571 1. 0.75 0.78571429 0.84615385 0.85714286 0.91666667 0.90909091 0.77777778] mean value: 0.8271117771117771 key: train_precision value: [0.92391304 0.83333333 0.96842105 0.97959184 0.92857143 0.9009009 0.94444444 0.96875 0.91743119 0.94339623] mean value: 0.9308753459170286 key: test_recall value: [0.45454545 0.90909091 0.63636364 0.81818182 0.91666667 0.91666667 1. 0.91666667 0.90909091 0.63636364] mean value: 0.8113636363636364 key: train_recall value: [0.82524272 0.97087379 0.89320388 0.93203883 0.89215686 0.98039216 0.83333333 0.91176471 0.97087379 0.97087379] mean value: 0.9180753854940035 key: test_roc_auc value: [0.64393939 0.78787879 0.81818182 0.78409091 0.8219697 0.86742424 0.90909091 0.91287879 0.90909091 0.72727273] mean value: 0.8181818181818181 key: train_roc_auc value: [0.87830763 0.88739768 0.93189606 0.9562155 0.91209785 0.93679802 0.89239482 0.94131925 0.94174757 0.95631068] mean value: 0.9234485056158386 key: test_jcc value: [0.38461538 0.66666667 0.63636364 0.64285714 0.73333333 0.78571429 0.85714286 0.84615385 0.83333333 0.53846154] mean value: 0.6924642024642025 key: train_jcc value: [0.77272727 0.81300813 0.86792453 0.91428571 0.83486239 0.88495575 0.79439252 0.88571429 0.89285714 0.91743119] mean value: 0.8578158927526129 MCC on Blind test: 0.28 Accuracy on Blind test: 0.64 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01156855 0.00892901 0.00856614 0.0077455 0.00787854 0.00857687 0.00829124 0.00852704 0.00854969 0.00859594] mean value: 0.008722853660583497 key: score_time value: [0.01051092 0.00886154 0.00785089 0.00785041 0.008111 0.00851727 0.00845528 0.00843167 0.0078907 0.00845909] mean value: 0.008493876457214356 key: test_mcc value: [0.74242424 1. 0.91605722 0.66414149 0.83971912 0.74242424 0.74047959 0.74242424 0.83205029 0.73029674] mean value: 0.7950017190609769 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 1. 0.95652174 0.82608696 0.91304348 0.86956522 0.86956522 0.86956522 0.90909091 0.86363636] mean value: 0.8946640316205533 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.86956522 1. 0.95238095 0.83333333 0.90909091 0.86956522 0.88 0.86956522 0.9 0.85714286] mean value: 0.8940643704121964 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.83333333 1. 1. 0.76923077 1. 0.90909091 0.84615385 0.90909091 1. 0.9 ] mean value: 0.9166899766899766 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 1. 0.90909091 0.90909091 0.83333333 0.83333333 0.91666667 0.83333333 0.81818182 0.81818182] mean value: 0.878030303030303 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.87121212 1. 0.95454545 0.82954545 0.91666667 0.87121212 0.86742424 0.87121212 0.90909091 0.86363636] mean value: 0.8954545454545455 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.76923077 1. 0.90909091 0.71428571 0.83333333 0.76923077 0.78571429 0.76923077 0.81818182 0.75 ] mean value: 0.8118298368298369 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.04 Accuracy on Blind test: 0.51 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.08520198 0.08628082 0.08537316 0.08609509 0.08660245 0.08801889 0.08706832 0.08594346 0.08755255 0.08668995] mean value: 0.08648266792297363 key: score_time value: [0.01721334 0.01664853 0.01649141 0.0179987 0.01672387 0.01804256 0.016927 0.01826859 0.01752901 0.01691222] mean value: 0.017275524139404298 key: test_mcc value: [0.56490196 1. 0.83743579 0.6992059 0.69084928 0.82575758 0.83743579 0.91605722 1. 0.81818182] mean value: 0.8189825331284214 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.7826087 1. 0.91304348 0.82608696 0.82608696 0.91304348 0.91304348 0.95652174 1. 0.90909091] mean value: 0.9039525691699605 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.76190476 1. 0.9 0.84615385 0.85714286 0.91666667 0.92307692 0.96 1. 0.90909091] mean value: 0.9074035964035964 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 1. 1. 0.73333333 0.75 0.91666667 0.85714286 0.92307692 1. 0.90909091] mean value: 0.8889310689310689 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 1. 0.81818182 1. 1. 0.91666667 1. 1. 1. 0.90909091] mean value: 0.9371212121212121 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.78030303 1. 0.90909091 0.83333333 0.81818182 0.91287879 0.90909091 0.95454545 1. 0.90909091] mean value: 0.9026515151515151 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.61538462 1. 0.81818182 0.73333333 0.75 0.84615385 0.85714286 0.92307692 1. 0.83333333] mean value: 0.8376606726606727 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.33 Accuracy on Blind test: 0.63 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.0074172 0.00699353 0.00702 0.0070374 0.00713778 0.00709534 0.00710583 0.00709867 0.00728655 0.00706387] mean value: 0.0071256160736083984 key: score_time value: [0.00813842 0.00795412 0.00787568 0.00795722 0.00794506 0.00791264 0.0079174 0.00833249 0.00816393 0.00792503] mean value: 0.008012199401855468 key: test_mcc value: [0.48075018 0.39727608 0.65909298 0.48856385 0.56490196 0.82575758 0.56818182 0.56818182 0.20412415 0.54772256] mean value: 0.5304552960001759 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73913043 0.69565217 0.82608696 0.73913043 0.7826087 0.91304348 0.7826087 0.7826087 0.59090909 0.77272727] mean value: 0.7624505928853755 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.7 0.63157895 0.8 0.75 0.8 0.91666667 0.7826087 0.7826087 0.47058824 0.76190476] mean value: 0.7395956002538315 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.77777778 0.75 0.88888889 0.69230769 0.76923077 0.91666667 0.81818182 0.81818182 0.66666667 0.8 ] mean value: 0.7897902097902098 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.63636364 0.54545455 0.72727273 0.81818182 0.83333333 0.91666667 0.75 0.75 0.36363636 0.72727273] mean value: 0.7068181818181818 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73484848 0.68939394 0.8219697 0.74242424 0.78030303 0.91287879 0.78409091 0.78409091 0.59090909 0.77272727] mean value: 0.7613636363636364 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.53846154 0.46153846 0.66666667 0.6 0.66666667 0.84615385 0.64285714 0.64285714 0.30769231 0.61538462] mean value: 0.5988278388278389 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.24 Accuracy on Blind test: 0.62 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.15525389 1.24308133 1.07740855 1.09325242 1.08529234 1.07614017 1.08734107 1.08477783 1.07814384 1.07799172] mean value: 1.1058683156967164 key: score_time value: [0.09643054 0.09530497 0.09550691 0.09580946 0.09401822 0.09396243 0.09032536 0.09210563 0.08807588 0.09230471] mean value: 0.09338440895080566 key: test_mcc value: [0.56490196 1. 0.91605722 0.76764947 0.91666667 0.82575758 0.83743579 0.91605722 1. 0.91287093] mean value: 0.8657396839505284 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.7826087 1. 0.95652174 0.86956522 0.95652174 0.91304348 0.91304348 0.95652174 1. 0.95454545] mean value: 0.9302371541501976 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.76190476 1. 0.95238095 0.88 0.95652174 0.91666667 0.92307692 0.96 1. 0.95652174] mean value: 0.9307072782290173 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 1. 1. 0.78571429 1. 0.91666667 0.85714286 0.92307692 1. 0.91666667] mean value: 0.9199267399267399 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 1. 0.90909091 1. 0.91666667 0.91666667 1. 1. 1. 1. ] mean value: 0.946969696969697 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.78030303 1. 0.95454545 0.875 0.95833333 0.91287879 0.90909091 0.95454545 1. 0.95454545] mean value: 0.9299242424242424 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.61538462 1. 0.90909091 0.78571429 0.91666667 0.84615385 0.85714286 0.92307692 1. 0.91666667] mean value: 0.876989676989677 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.16 Accuracy on Blind test: 0.56 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.86088943 0.88546109 0.83543372 0.87413931 0.92852831 0.89648938 0.97815275 0.92712474 0.90053868 0.94448614] mean value: 0.9031243562698364 key: score_time value: [0.25403523 0.22465062 0.24028587 0.24344826 0.16241813 0.23810482 0.23798776 0.16599536 0.18922591 0.23433256] mean value: 0.21904845237731935 key: test_mcc value: [0.48075018 0.91666667 0.82575758 0.47727273 1. 0.74242424 0.83743579 0.91605722 1. 0.64715023] mean value: 0.7843514630610502 key: train_mcc value: [0.90516294 0.89609853 0.89781488 0.91325992 0.91435567 0.92355447 0.91435567 0.92355447 0.89663335 0.89663335] mean value: 0.9081423249606505 key: test_accuracy value: [0.73913043 0.95652174 0.91304348 0.73913043 1. 0.86956522 0.91304348 0.95652174 1. 0.81818182] mean value: 0.8905138339920948 key: train_accuracy value: [0.95121951 0.94634146 0.94634146 0.95609756 0.95609756 0.96097561 0.95609756 0.96097561 0.94660194 0.94660194] mean value: 0.952735022495856 key: test_fscore value: [0.7 0.95652174 0.90909091 0.72727273 1. 0.86956522 0.92307692 0.96 1. 0.83333333] mean value: 0.8878860849295632 key: train_fscore value: [0.95327103 0.94883721 0.94930876 0.95734597 0.95734597 0.96190476 0.95734597 0.96190476 0.94883721 0.94883721] mean value: 0.9544938850206197 key: test_precision value: [0.77777778 0.91666667 0.90909091 0.72727273 1. 0.90909091 0.85714286 0.92307692 1. 0.76923077] mean value: 0.8789349539349539 key: train_precision value: [0.91891892 0.91071429 0.90350877 0.93518519 0.9266055 0.93518519 0.9266055 0.93518519 0.91071429 0.91071429] mean value: 0.9213337112721468 key: test_recall value: [0.63636364 1. 0.90909091 0.72727273 1. 0.83333333 1. 1. 1. 0.90909091] mean value: 0.9015151515151515 key: train_recall value: [0.99029126 0.99029126 1. 0.98058252 0.99019608 0.99019608 0.99019608 0.99019608 0.99029126 0.99029126] mean value: 0.9902531886541024 key: test_roc_auc value: [0.73484848 0.95833333 0.91287879 0.73863636 1. 0.87121212 0.90909091 0.95454545 1. 0.81818182] mean value: 0.8897727272727273 key: train_roc_auc value: [0.95102798 0.94612602 0.94607843 0.95597754 0.95626309 0.96111746 0.95626309 0.96111746 0.94660194 0.94660194] mean value: 0.9527174947648962 key: test_jcc value: [0.53846154 0.91666667 0.83333333 0.57142857 1. 0.76923077 0.85714286 0.92307692 1. 0.71428571] mean value: 0.8123626373626374 key: train_jcc value: [0.91071429 0.90265487 0.90350877 0.91818182 0.91818182 0.9266055 0.91818182 0.9266055 0.90265487 0.90265487] mean value: 0.9129944123133789 MCC on Blind test: 0.34 Accuracy on Blind test: 0.64 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01725674 0.00764561 0.00769782 0.00771165 0.00768232 0.00783229 0.00767946 0.00749207 0.00788879 0.00778747] mean value: 0.008667421340942384 key: score_time value: [0.01506519 0.00864601 0.00875545 0.00867391 0.00863385 0.00854945 0.00867081 0.00864387 0.00859499 0.00872564] mean value: 0.009295916557312012 key: test_mcc value: [ 0.3030303 0.15096491 -0.03816905 0.3030303 0.39727608 0.56818182 0.39727608 0.31252706 0.29277002 0.09245003] mean value: 0.27793375505710965 key: train_mcc value: [0.37046449 0.38910743 0.39476736 0.38236392 0.38354703 0.35891522 0.35302365 0.36367161 0.37290762 0.39345795] mean value: 0.37622262820367175 key: test_accuracy value: [0.65217391 0.56521739 0.47826087 0.65217391 0.69565217 0.7826087 0.69565217 0.65217391 0.63636364 0.54545455] mean value: 0.6355731225296443 key: train_accuracy value: [0.68292683 0.69268293 0.69268293 0.68780488 0.68780488 0.67804878 0.67317073 0.67804878 0.68446602 0.68932039] mean value: 0.6846957139474308 key: test_fscore value: [0.63636364 0.61538462 0.5 0.63636364 0.74074074 0.7826087 0.74074074 0.71428571 0.69230769 0.58333333] mean value: 0.6642128805172284 key: train_fscore value: [0.70852018 0.71493213 0.72489083 0.71681416 0.71428571 0.69444444 0.69955157 0.70535714 0.70588235 0.72649573] mean value: 0.7111174245586319 key: test_precision value: [0.63636364 0.53333333 0.46153846 0.63636364 0.66666667 0.81818182 0.66666667 0.625 0.6 0.53846154] mean value: 0.6182575757575758 key: train_precision value: [0.65833333 0.66949153 0.65873016 0.65853659 0.6557377 0.65789474 0.6446281 0.64754098 0.66101695 0.64885496] mean value: 0.6560765038377927 key: test_recall value: [0.63636364 0.72727273 0.54545455 0.63636364 0.83333333 0.75 0.83333333 0.83333333 0.81818182 0.63636364] mean value: 0.725 key: train_recall value: [0.76699029 0.76699029 0.80582524 0.78640777 0.78431373 0.73529412 0.76470588 0.7745098 0.75728155 0.82524272] mean value: 0.7767561393489435 key: test_roc_auc value: [0.65151515 0.5719697 0.48106061 0.65151515 0.68939394 0.78409091 0.68939394 0.64393939 0.63636364 0.54545455] mean value: 0.634469696969697 key: train_roc_auc value: [0.68251475 0.69231868 0.69212831 0.68732153 0.68827337 0.67832667 0.67361508 0.67851704 0.68446602 0.68932039] mean value: 0.6846801827527127 key: test_jcc value: [0.46666667 0.44444444 0.33333333 0.46666667 0.58823529 0.64285714 0.58823529 0.55555556 0.52941176 0.41176471] mean value: 0.5027170868347339 key: train_jcc value: [0.54861111 0.55633803 0.56849315 0.55862069 0.55555556 0.53191489 0.53793103 0.54482759 0.54545455 0.5704698 ] mean value: 0.5518216393594725 MCC on Blind test: 0.47 Accuracy on Blind test: 0.73 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.11559343 0.039253 0.03851128 0.17964649 0.04505944 0.04687715 0.03830266 0.04024267 0.03952813 0.04023385] mean value: 0.062324810028076175 key: score_time value: [0.0100162 0.01036406 0.01036692 0.01054835 0.00991821 0.00990653 0.00961637 0.00992942 0.00987625 0.01000285] mean value: 0.010054516792297363 key: test_mcc value: [0.65151515 1. 0.91605722 0.76764947 0.83971912 0.83971912 0.76277007 0.91605722 1. 0.81818182] mean value: 0.8511669209848822 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.82608696 1. 0.95652174 0.86956522 0.91304348 0.91304348 0.86956522 0.95652174 1. 0.90909091] mean value: 0.9213438735177866 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.81818182 1. 0.95238095 0.88 0.90909091 0.90909091 0.88888889 0.96 1. 0.90909091] mean value: 0.9226724386724386 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.81818182 1. 1. 0.78571429 1. 1. 0.8 0.92307692 1. 0.90909091] mean value: 0.9236063936063936 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 1. 0.90909091 1. 0.83333333 0.83333333 1. 1. 1. 0.90909091] mean value: 0.9303030303030303 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.82575758 1. 0.95454545 0.875 0.91666667 0.91666667 0.86363636 0.95454545 1. 0.90909091] mean value: 0.9215909090909091 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.69230769 1. 0.90909091 0.78571429 0.83333333 0.83333333 0.8 0.92307692 1. 0.83333333] mean value: 0.861018981018981 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.07 Accuracy on Blind test: 0.52 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01018047 0.03178215 0.03211904 0.0325737 0.03071642 0.03224421 0.03228498 0.0325532 0.03256488 0.03023434] mean value: 0.02972533702850342 key: score_time value: [0.01017213 0.02084589 0.02090144 0.01349664 0.02145767 0.02162623 0.01061773 0.01060104 0.01897764 0.02124476] mean value: 0.016994118690490723 key: test_mcc value: [0.58002308 0.65151515 0.56490196 0.83971912 0.83971912 0.91666667 0.74047959 0.82575758 0.83205029 0.73029674] mean value: 0.7521129297642005 key: train_mcc value: [0.87352395 0.87320324 0.86356283 0.83418999 0.88310329 0.83418999 0.84389872 0.85370265 0.83499081 0.84481947] mean value: 0.8539184935656506 key: test_accuracy value: [0.7826087 0.82608696 0.7826087 0.91304348 0.91304348 0.95652174 0.86956522 0.91304348 0.90909091 0.86363636] mean value: 0.8729249011857707 key: train_accuracy value: [0.93658537 0.93658537 0.93170732 0.91707317 0.94146341 0.91707317 0.92195122 0.92682927 0.91747573 0.9223301 ] mean value: 0.9269074117925645 key: test_fscore value: [0.73684211 0.81818182 0.76190476 0.91666667 0.90909091 0.95652174 0.88 0.91666667 0.9 0.86956522] mean value: 0.8665439884295719 key: train_fscore value: [0.93779904 0.93719807 0.93269231 0.91707317 0.94174757 0.91707317 0.92156863 0.92682927 0.9178744 0.92156863] mean value: 0.9271424251996218 key: test_precision value: [0.875 0.81818182 0.8 0.84615385 1. 1. 0.84615385 0.91666667 1. 0.83333333] mean value: 0.893548951048951 key: train_precision value: [0.9245283 0.93269231 0.92380952 0.92156863 0.93269231 0.91262136 0.92156863 0.9223301 0.91346154 0.93069307] mean value: 0.9235965760062042 key: test_recall value: [0.63636364 0.81818182 0.72727273 1. 0.83333333 0.91666667 0.91666667 0.91666667 0.81818182 0.90909091] mean value: 0.8492424242424242 key: train_recall value: [0.95145631 0.94174757 0.94174757 0.91262136 0.95098039 0.92156863 0.92156863 0.93137255 0.9223301 0.91262136] mean value: 0.9308014467923091 key: test_roc_auc value: [0.77651515 0.82575758 0.78030303 0.91666667 0.91666667 0.95833333 0.86742424 0.91287879 0.90909091 0.86363636] mean value: 0.8727272727272727 key: train_roc_auc value: [0.93651247 0.93656006 0.9316581 0.91709499 0.94150961 0.91709499 0.92194936 0.92685132 0.91747573 0.9223301 ] mean value: 0.9269036740909956 key: test_jcc value: [0.58333333 0.69230769 0.61538462 0.84615385 0.83333333 0.91666667 0.78571429 0.84615385 0.81818182 0.76923077] mean value: 0.7706460206460206 key: train_jcc value: [0.88288288 0.88181818 0.87387387 0.84684685 0.88990826 0.84684685 0.85454545 0.86363636 0.84821429 0.85454545] mean value: 0.8643118447590925 MCC on Blind test: 0.1 Accuracy on Blind test: 0.55 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01748133 0.00712228 0.00695634 0.00682926 0.00673771 0.00671387 0.00680852 0.0069313 0.00677037 0.00677538] mean value: 0.007912635803222656 key: score_time value: [0.00848055 0.0081079 0.0079267 0.0077877 0.00768209 0.00775528 0.00768805 0.00777531 0.00776768 0.00765419] mean value: 0.007862544059753418 key: test_mcc value: [0.39727608 0.33371191 0.39393939 0.21969697 0.21452908 0.66414149 0.39727608 0.48856385 0.63636364 0.27272727] mean value: 0.40182257604909155 key: train_mcc value: [0.47440586 0.49337247 0.49337247 0.46430782 0.47361912 0.44415883 0.46367706 0.44784529 0.43062816 0.4882291 ] mean value: 0.46736161990058567 key: test_accuracy value: [0.69565217 0.65217391 0.69565217 0.60869565 0.60869565 0.82608696 0.69565217 0.73913043 0.81818182 0.63636364] mean value: 0.6976284584980237 key: train_accuracy value: [0.73658537 0.74634146 0.74634146 0.73170732 0.73658537 0.72195122 0.73170732 0.72195122 0.71359223 0.74271845] mean value: 0.7329481411318968 key: test_fscore value: [0.63157895 0.69230769 0.69565217 0.60869565 0.66666667 0.81818182 0.74074074 0.72727273 0.81818182 0.63636364] mean value: 0.7035641873170477 key: train_fscore value: [0.74766355 0.75471698 0.75471698 0.74178404 0.74038462 0.72463768 0.73429952 0.73732719 0.73059361 0.75576037] mean value: 0.7421884529586577 key: test_precision value: [0.75 0.6 0.66666667 0.58333333 0.6 0.9 0.66666667 0.8 0.81818182 0.63636364] mean value: 0.7021212121212121 key: train_precision value: [0.72072072 0.73394495 0.73394495 0.71818182 0.72641509 0.71428571 0.72380952 0.69565217 0.68965517 0.71929825] mean value: 0.7175908371535152 key: test_recall value: [0.54545455 0.81818182 0.72727273 0.63636364 0.75 0.75 0.83333333 0.66666667 0.81818182 0.63636364] mean value: 0.7181818181818181 key: train_recall value: [0.77669903 0.77669903 0.77669903 0.76699029 0.75490196 0.73529412 0.74509804 0.78431373 0.77669903 0.7961165 ] mean value: 0.7689510755758614 key: test_roc_auc value: [0.68939394 0.65909091 0.6969697 0.60984848 0.60227273 0.82954545 0.68939394 0.74242424 0.81818182 0.63636364] mean value: 0.6973484848484848 key: train_roc_auc value: [0.73638873 0.74619265 0.74619265 0.73153436 0.73667428 0.72201599 0.73177232 0.72225395 0.71359223 0.74271845] mean value: 0.7329335617742243 key: test_jcc value: [0.46153846 0.52941176 0.53333333 0.4375 0.5 0.69230769 0.58823529 0.57142857 0.69230769 0.46666667] mean value: 0.5472729476405946 key: train_jcc value: [0.59701493 0.60606061 0.60606061 0.58955224 0.58778626 0.56818182 0.58015267 0.58394161 0.57553957 0.60740741] mean value: 0.5901697707371992 MCC on Blind test: 0.43 Accuracy on Blind test: 0.71 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0074048 0.00984526 0.01072073 0.00961113 0.01026011 0.01042223 0.01001883 0.01076102 0.01011777 0.01003718] mean value: 0.00991990566253662 key: score_time value: [0.00778842 0.00981927 0.00983167 0.01007533 0.01036739 0.01042461 0.01043558 0.01044655 0.01036429 0.01048994] mean value: 0.010004305839538574 key: test_mcc value: [0.56490196 0.66414149 0.65151515 0.63327851 0.91666667 0.74047959 0.83743579 0.91605722 0.91287093 0.54772256] mean value: 0.7385069858830032 key: train_mcc value: [0.8345235 0.8345235 0.86600321 0.61725542 0.82136935 0.84332727 0.82455974 0.85570033 0.78655606 0.79179983] mean value: 0.8075618225770705 key: test_accuracy value: [0.7826087 0.82608696 0.82608696 0.7826087 0.95652174 0.86956522 0.91304348 0.95652174 0.95454545 0.77272727] mean value: 0.8640316205533597 key: train_accuracy value: [0.91707317 0.91707317 0.93170732 0.7804878 0.90731707 0.91707317 0.91219512 0.92682927 0.89320388 0.89320388] mean value: 0.8996163864551268 key: test_fscore value: [0.76190476 0.83333333 0.81818182 0.81481481 0.95652174 0.88 0.92307692 0.96 0.95652174 0.76190476] mean value: 0.8666259891477283 key: train_fscore value: [0.91625616 0.91625616 0.93457944 0.81927711 0.9124424 0.92237443 0.91262136 0.92890995 0.89215686 0.88659794] mean value: 0.904147180121348 key: test_precision value: [0.8 0.76923077 0.81818182 0.6875 1. 0.84615385 0.85714286 0.92307692 0.91666667 0.8 ] mean value: 0.841795288045288 key: train_precision value: [0.93 0.93 0.9009009 0.69863014 0.86086957 0.86324786 0.90384615 0.89908257 0.9009901 0.94505495] mean value: 0.8832622233070796 key: test_recall value: [0.72727273 0.90909091 0.81818182 1. 0.91666667 0.91666667 1. 1. 1. 0.72727273] mean value: 0.9015151515151515 key: train_recall value: [0.90291262 0.90291262 0.97087379 0.99029126 0.97058824 0.99019608 0.92156863 0.96078431 0.88349515 0.83495146] mean value: 0.9328574148105845 key: test_roc_auc value: [0.78030303 0.82954545 0.82575758 0.79166667 0.95833333 0.86742424 0.90909091 0.95454545 0.95454545 0.77272727] mean value: 0.8643939393939394 key: train_roc_auc value: [0.91714259 0.91714259 0.93151532 0.77945936 0.90762421 0.91742814 0.91224062 0.9269941 0.89320388 0.89320388] mean value: 0.8995954692556635 key: test_jcc value: [0.61538462 0.71428571 0.69230769 0.6875 0.91666667 0.78571429 0.85714286 0.92307692 0.91666667 0.61538462] mean value: 0.7724130036630037 key: train_jcc value: [0.84545455 0.84545455 0.87719298 0.69387755 0.83898305 0.8559322 0.83928571 0.86725664 0.80530973 0.7962963 ] mean value: 0.8265043260886354 MCC on Blind test: 0.2 Accuracy on Blind test: 0.57 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01033568 0.01053357 0.01022339 0.01038384 0.01062822 0.01051664 0.01035213 0.01098609 0.01057267 0.01157546] mean value: 0.01061077117919922 key: score_time value: [0.01053452 0.01041985 0.01034975 0.01045704 0.01039529 0.01041269 0.01073003 0.01061511 0.0108068 0.0110898 ] mean value: 0.010581088066101075 key: test_mcc value: [0.33946383 0.39727608 0.65909298 0.76764947 0.83971912 0.76277007 0.76277007 0.82575758 0.83205029 0.63636364] mean value: 0.6822913134388915 key: train_mcc value: [0.74004127 0.85570033 0.72342586 0.7674294 0.80545006 0.55024014 0.7696264 0.91224062 0.81572728 0.85473156] mean value: 0.7794612920006722 key: test_accuracy value: [0.65217391 0.69565217 0.82608696 0.86956522 0.91304348 0.86956522 0.86956522 0.91304348 0.90909091 0.81818182] mean value: 0.833596837944664 key: train_accuracy value: [0.85365854 0.92682927 0.84878049 0.87317073 0.90243902 0.73170732 0.87804878 0.95609756 0.90291262 0.92718447] mean value: 0.8800828794695714 key: test_fscore value: [0.5 0.63157895 0.8 0.88 0.90909091 0.88888889 0.88888889 0.91666667 0.91666667 0.81818182] mean value: 0.8149962785752259 key: train_fscore value: [0.82954545 0.92462312 0.82681564 0.88695652 0.9 0.78764479 0.88789238 0.95609756 0.90990991 0.92610837] mean value: 0.8835593743916733 key: test_precision value: [0.8 0.75 0.88888889 0.78571429 1. 0.8 0.8 0.91666667 0.84615385 0.81818182] mean value: 0.8405605505605506 key: train_precision value: [1. 0.95833333 0.97368421 0.80314961 0.91836735 0.64968153 0.81818182 0.95145631 0.8487395 0.94 ] mean value: 0.8861593650419807 key: test_recall value: [0.36363636 0.54545455 0.72727273 1. 0.83333333 1. 1. 0.91666667 1. 0.81818182] mean value: 0.8204545454545454 key: train_recall value: [0.70873786 0.89320388 0.7184466 0.99029126 0.88235294 1. 0.97058824 0.96078431 0.98058252 0.91262136] mean value: 0.901760898534171 key: test_roc_auc value: [0.64015152 0.68939394 0.8219697 0.875 0.91666667 0.86363636 0.86363636 0.91287879 0.90909091 0.81818182] mean value: 0.831060606060606 key: train_roc_auc value: [0.85436893 0.9269941 0.84941938 0.87259661 0.90234152 0.73300971 0.878498 0.95612031 0.90291262 0.92718447] mean value: 0.8803445650104702 key: test_jcc value: [0.33333333 0.46153846 0.66666667 0.78571429 0.83333333 0.8 0.8 0.84615385 0.84615385 0.69230769] mean value: 0.7065201465201465 key: train_jcc value: [0.70873786 0.85981308 0.7047619 0.796875 0.81818182 0.64968153 0.7983871 0.91588785 0.83471074 0.86238532] mean value: 0.7949422211940016 MCC on Blind test: 0.11 Accuracy on Blind test: 0.54 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.08366251 0.06921864 0.0717814 0.07228327 0.07036138 0.0708344 0.07182002 0.0701704 0.07077861 0.07223129] mean value: 0.07231419086456299 key: score_time value: [0.01484537 0.01417661 0.01449132 0.01528001 0.01412058 0.01532507 0.01464701 0.01492405 0.01425171 0.01426148] mean value: 0.014632320404052735 key: test_mcc value: [0.83971912 0.91605722 0.91605722 0.58930667 0.83971912 0.91666667 0.76277007 0.82575758 0.91287093 0.81818182] mean value: 0.833710642294015 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91304348 0.95652174 0.95652174 0.7826087 0.91304348 0.95652174 0.86956522 0.91304348 0.95454545 0.90909091] mean value: 0.9124505928853754 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.91666667 0.95238095 0.95238095 0.8 0.90909091 0.95652174 0.88888889 0.91666667 0.95238095 0.90909091] mean value: 0.9154068636677333 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.84615385 1. 1. 0.71428571 1. 1. 0.8 0.91666667 1. 0.90909091] mean value: 0.9186197136197136 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.90909091 0.90909091 0.90909091 0.83333333 0.91666667 1. 0.91666667 0.90909091 0.90909091] mean value: 0.9212121212121211 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91666667 0.95454545 0.95454545 0.78787879 0.91666667 0.95833333 0.86363636 0.91287879 0.95454545 0.90909091] mean value: 0.9128787878787878 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.84615385 0.90909091 0.90909091 0.66666667 0.83333333 0.91666667 0.8 0.84615385 0.90909091 0.83333333] mean value: 0.8469580419580419 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.05 Accuracy on Blind test: 0.52 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.03033328 0.03085208 0.03747296 0.03314781 0.02555704 0.03062606 0.02538133 0.02366614 0.04046106 0.03656936] mean value: 0.031406712532043454 key: score_time value: [0.02738023 0.02815413 0.01663828 0.01588154 0.01602888 0.01523161 0.02233171 0.02261209 0.01840067 0.01769853] mean value: 0.020035767555236818 key: test_mcc value: [0.83971912 0.83743579 0.76277007 0.66414149 0.91666667 0.82575758 0.83743579 0.91605722 1. 0.91287093] mean value: 0.851285465730236 key: train_mcc value: [0.99029126 0.98067587 0.99029126 1. 1. 1. 1. 0.99029126 0.99033794 0.99033794] mean value: 0.9932225534805602 key: test_accuracy value: [0.91304348 0.91304348 0.86956522 0.82608696 0.95652174 0.91304348 0.91304348 0.95652174 1. 0.95454545] mean value: 0.9215415019762846 key: train_accuracy value: [0.99512195 0.9902439 0.99512195 1. 1. 1. 1. 0.99512195 0.99514563 0.99514563] mean value: 0.9965901018233483 key: test_fscore value: [0.91666667 0.9 0.84210526 0.83333333 0.95652174 0.91666667 0.92307692 0.96 1. 0.95652174] mean value: 0.9204892331162354 key: train_fscore value: [0.99512195 0.99019608 0.99512195 1. 1. 1. 1. 0.99512195 0.99512195 0.99512195] mean value: 0.9965805834528934 key: test_precision value: [0.84615385 1. 1. 0.76923077 1. 0.91666667 0.85714286 0.92307692 1. 0.91666667] mean value: 0.922893772893773 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 0.99029126 1. 1. ] mean value: 0.9990291262135922 key: test_recall value: [1. 0.81818182 0.72727273 0.90909091 0.91666667 0.91666667 1. 1. 1. 1. ] mean value: 0.9287878787878788 key: train_recall value: [0.99029126 0.98058252 0.99029126 1. 1. 1. 1. 1. 0.99029126 0.99029126] mean value: 0.9941747572815534 key: test_roc_auc value: [0.91666667 0.90909091 0.86363636 0.82954545 0.95833333 0.91287879 0.90909091 0.95454545 1. 0.95454545] mean value: 0.9208333333333333 key: train_roc_auc value: [0.99514563 0.99029126 0.99514563 1. 1. 1. 1. 0.99514563 0.99514563 0.99514563] mean value: 0.9966019417475728 key: test_jcc value: [0.84615385 0.81818182 0.72727273 0.71428571 0.91666667 0.84615385 0.85714286 0.92307692 1. 0.91666667] mean value: 0.8565601065601065 key: train_jcc value: [0.99029126 0.98058252 0.99029126 1. 1. 1. 1. 0.99029126 0.99029126 0.99029126] mean value: 0.9932038834951457 MCC on Blind test: 0.13 Accuracy on Blind test: 0.55 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.04991055 0.09063339 0.06661296 0.04403114 0.03577352 0.03729105 0.02286935 0.02291799 0.02340531 0.04758835] mean value: 0.044103360176086424 key: score_time value: [0.02371454 0.02476144 0.03154182 0.01158547 0.02154064 0.01150608 0.01171994 0.01175761 0.01140141 0.01663899] mean value: 0.017616796493530273 key: test_mcc value: [0.39727608 0.56818182 0.56490196 0.38932432 0.31252706 0.6992059 0.65151515 0.74242424 0.64715023 0.61237244] mean value: 0.558487918534798 key: train_mcc value: [0.93174679 0.94146202 0.9024367 0.92194936 0.91224062 0.89272796 0.9024367 0.94146202 0.91266437 0.93243443] mean value: 0.9191560993974353 key: test_accuracy value: [0.69565217 0.7826087 0.7826087 0.69565217 0.65217391 0.82608696 0.82608696 0.86956522 0.81818182 0.77272727] mean value: 0.7721343873517786 key: train_accuracy value: [0.96585366 0.97073171 0.95121951 0.96097561 0.95609756 0.94634146 0.95121951 0.97073171 0.95631068 0.96601942] mean value: 0.9595500828794695 key: test_fscore value: [0.63157895 0.7826087 0.76190476 0.66666667 0.71428571 0.8 0.83333333 0.86956522 0.8 0.70588235] mean value: 0.7565825689543552 key: train_fscore value: [0.96618357 0.97087379 0.95145631 0.96116505 0.95609756 0.94634146 0.95098039 0.97058824 0.95652174 0.96650718] mean value: 0.9596715288515447 key: test_precision value: [0.75 0.75 0.8 0.7 0.625 1. 0.83333333 0.90909091 0.88888889 1. ] mean value: 0.8256313131313131 key: train_precision value: [0.96153846 0.97087379 0.95145631 0.96116505 0.95145631 0.94174757 0.95098039 0.97058824 0.95192308 0.95283019] mean value: 0.9564559383717978 key: test_recall value: [0.54545455 0.81818182 0.72727273 0.63636364 0.83333333 0.66666667 0.83333333 0.83333333 0.72727273 0.54545455] mean value: 0.7166666666666667 key: train_recall value: [0.97087379 0.97087379 0.95145631 0.96116505 0.96078431 0.95098039 0.95098039 0.97058824 0.96116505 0.98058252] mean value: 0.9629449838187703 key: test_roc_auc value: [0.68939394 0.78409091 0.78030303 0.69318182 0.64393939 0.83333333 0.82575758 0.87121212 0.81818182 0.77272727] mean value: 0.7712121212121212 key: train_roc_auc value: [0.96582905 0.97073101 0.95121835 0.96097468 0.95612031 0.94636398 0.95121835 0.97073101 0.95631068 0.96601942] mean value: 0.9595516847515705 key: test_jcc value: [0.46153846 0.64285714 0.61538462 0.5 0.55555556 0.66666667 0.71428571 0.76923077 0.66666667 0.54545455] mean value: 0.6137640137640138 key: train_jcc value: [0.93457944 0.94339623 0.90740741 0.92523364 0.91588785 0.89814815 0.90654206 0.94285714 0.91666667 0.93518519] mean value: 0.9225903767333851 MCC on Blind test: 0.34 Accuracy on Blind test: 0.67 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.1293633 0.12412262 0.12290406 0.12451863 0.12319613 0.12431335 0.12317395 0.12272644 0.12358046 0.12428999] mean value: 0.12421889305114746 key: score_time value: [0.00877905 0.00823951 0.00833726 0.00852513 0.00845313 0.00828242 0.00824165 0.0083313 0.00858903 0.00842285] mean value: 0.008420133590698242 key: test_mcc value: [0.74242424 0.91605722 0.91605722 0.76764947 0.83971912 0.91605722 0.83743579 0.91605722 1. 0.81818182] mean value: 0.866963934561781 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.95652174 0.95652174 0.86956522 0.91304348 0.95652174 0.91304348 0.95652174 1. 0.90909091] mean value: 0.9300395256916996 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.86956522 0.95238095 0.95238095 0.88 0.90909091 0.96 0.92307692 0.96 1. 0.90909091] mean value: 0.931558586341195 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.83333333 1. 1. 0.78571429 1. 0.92307692 0.85714286 0.92307692 1. 0.90909091] mean value: 0.9231435231435231 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.90909091 0.90909091 1. 0.83333333 1. 1. 1. 1. 0.90909091] mean value: 0.946969696969697 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.87121212 0.95454545 0.95454545 0.875 0.91666667 0.95454545 0.90909091 0.95454545 1. 0.90909091] mean value: 0.9299242424242424 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.76923077 0.90909091 0.90909091 0.78571429 0.83333333 0.92307692 0.85714286 0.92307692 1. 0.83333333] mean value: 0.8743090243090244 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.14 Accuracy on Blind test: 0.55 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.00908375 0.01190424 0.01377177 0.01136947 0.01182413 0.0117712 0.01173353 0.01190329 0.0147233 0.01190543] mean value: 0.011999011039733887 key: score_time value: [0.01050138 0.01059628 0.01065016 0.01060724 0.01079369 0.01067948 0.01063824 0.01067424 0.01086879 0.0129354 ] mean value: 0.010894489288330079 key: test_mcc value: [0.47727273 0.66414149 0.17236256 0.37057951 0.55048188 0.56490196 0.40451992 0.40451992 0.13245324 0.48795004] mean value: 0.42291832347791636 key: train_mcc value: [0.5185658 0.60463182 0.61253896 0.61919584 0.62634721 0.57825573 0.42798979 0.54305523 0.59064979 0.61850654] mean value: 0.5739736713126434 key: test_accuracy value: [0.73913043 0.82608696 0.56521739 0.65217391 0.73913043 0.7826087 0.65217391 0.65217391 0.54545455 0.72727273] mean value: 0.6881422924901186 key: train_accuracy value: [0.72682927 0.8 0.7902439 0.78536585 0.78536585 0.76585366 0.65365854 0.73658537 0.77669903 0.77669903] mean value: 0.7597300497276818 key: test_fscore value: [0.72727273 0.83333333 0.64285714 0.71428571 0.8 0.8 0.75 0.75 0.66666667 0.76923077] mean value: 0.7453646353646354 key: train_fscore value: [0.78125 0.81278539 0.82008368 0.82113821 0.82113821 0.80327869 0.74181818 0.78740157 0.80991736 0.81746032] mean value: 0.8016271610878589 key: test_precision value: [0.72727273 0.76923077 0.52941176 0.58823529 0.66666667 0.76923077 0.6 0.6 0.52631579 0.66666667] mean value: 0.6443030447364813 key: train_precision value: [0.65359477 0.76724138 0.72058824 0.70629371 0.70138889 0.69014085 0.58959538 0.65789474 0.70503597 0.69127517] mean value: 0.6883049077672215 key: test_recall value: [0.72727273 0.90909091 0.81818182 0.90909091 1. 0.83333333 1. 1. 0.90909091 0.90909091] mean value: 0.9015151515151515 key: train_recall value: [0.97087379 0.86407767 0.95145631 0.98058252 0.99019608 0.96078431 1. 0.98039216 0.95145631 1. ] mean value: 0.9649819150961355 key: test_roc_auc value: [0.73863636 0.82954545 0.57575758 0.66287879 0.72727273 0.78030303 0.63636364 0.63636364 0.54545455 0.72727273] mean value: 0.6859848484848485 key: train_roc_auc value: [0.72563297 0.79968589 0.78945365 0.78440891 0.78636018 0.76679992 0.65533981 0.73776889 0.77669903 0.77669903] mean value: 0.7598848277174948 key: test_jcc value: [0.57142857 0.71428571 0.47368421 0.55555556 0.66666667 0.66666667 0.6 0.6 0.5 0.625 ] mean value: 0.597328738512949 key: train_jcc value: [0.64102564 0.68461538 0.69503546 0.69655172 0.69655172 0.67123288 0.58959538 0.64935065 0.68055556 0.69127517] mean value: 0.6695789560036107 MCC on Blind test: 0.35 Accuracy on Blind test: 0.62 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.01302195 0.01023006 0.01021552 0.01021671 0.01030731 0.01027513 0.01022935 0.01022243 0.01024985 0.01030946] mean value: 0.010527777671813964 key: score_time value: [0.01048541 0.0103786 0.01038289 0.01038122 0.01036906 0.01034164 0.01037383 0.01037812 0.01037979 0.01041484] mean value: 0.010388541221618652 key: test_mcc value: [0.58002308 0.74242424 0.65909298 0.74242424 0.83971912 0.91666667 0.83743579 0.82575758 0.91287093 0.63636364] mean value: 0.7692778262419881 key: train_mcc value: [0.85368872 0.84407425 0.86341138 0.84407425 0.82438607 0.81495251 0.84389872 0.81495251 0.83499081 0.85473156] mean value: 0.8393160802573247 key: test_accuracy value: [0.7826087 0.86956522 0.82608696 0.86956522 0.91304348 0.95652174 0.91304348 0.91304348 0.95454545 0.81818182] mean value: 0.8816205533596838 key: train_accuracy value: [0.92682927 0.92195122 0.93170732 0.92195122 0.91219512 0.90731707 0.92195122 0.90731707 0.91747573 0.92718447] mean value: 0.9195879706369879 key: test_fscore value: [0.73684211 0.86956522 0.8 0.86956522 0.90909091 0.95652174 0.92307692 0.91666667 0.95238095 0.81818182] mean value: 0.875189154857347 key: train_fscore value: [0.92753623 0.92156863 0.93203883 0.92156863 0.91176471 0.90547264 0.92156863 0.90547264 0.9178744 0.92610837] mean value: 0.9190973699222151 key: test_precision value: [0.875 0.83333333 0.88888889 0.83333333 1. 1. 0.85714286 0.91666667 1. 0.81818182] mean value: 0.9022546897546897 key: train_precision value: [0.92307692 0.93069307 0.93203883 0.93069307 0.91176471 0.91919192 0.92156863 0.91919192 0.91346154 0.94 ] mean value: 0.9241680606820951 key: test_recall value: [0.63636364 0.90909091 0.72727273 0.90909091 0.83333333 0.91666667 1. 0.91666667 0.90909091 0.81818182] mean value: 0.8575757575757575 key: train_recall value: [0.93203883 0.91262136 0.93203883 0.91262136 0.91176471 0.89215686 0.92156863 0.89215686 0.9223301 0.91262136] mean value: 0.9141918903483723 key: test_roc_auc value: [0.77651515 0.87121212 0.8219697 0.87121212 0.91666667 0.95833333 0.90909091 0.91287879 0.95454545 0.81818182] mean value: 0.8810606060606061 key: train_roc_auc value: [0.92680373 0.92199695 0.93170569 0.92199695 0.91219303 0.90724348 0.92194936 0.90724348 0.91747573 0.92718447] mean value: 0.91957928802589 key: test_jcc value: [0.58333333 0.76923077 0.66666667 0.76923077 0.83333333 0.91666667 0.85714286 0.84615385 0.90909091 0.69230769] mean value: 0.7843156843156843 key: train_jcc value: [0.86486486 0.85454545 0.87272727 0.85454545 0.83783784 0.82727273 0.85454545 0.82727273 0.84821429 0.86238532] mean value: 0.8504211400426996 MCC on Blind test: 0.19 Accuracy on Blind test: 0.59 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:203: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:206: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd... 'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.08454299 0.08204246 0.08172989 0.0863173 0.09503531 0.08147788 0.08158469 0.08152223 0.09179831 0.08183026] mean value: 0.08478813171386719 key: score_time value: [0.01065302 0.01066089 0.01067209 0.01064348 0.01065207 0.010607 0.010638 0.01059103 0.01064205 0.01067996] mean value: 0.010643959045410156 key: test_mcc value: [0.58002308 0.65151515 0.65909298 0.74242424 0.83971912 0.91666667 0.83743579 0.82575758 0.91287093 0.73029674] mean value: 0.7695802278487375 key: train_mcc value: [0.85368872 0.87320324 0.86356283 0.84407425 0.87321531 0.83417421 0.84389872 0.85370265 0.83499081 0.86407767] mean value: 0.8538588407839809 key: test_accuracy value: [0.7826087 0.82608696 0.82608696 0.86956522 0.91304348 0.95652174 0.91304348 0.91304348 0.95454545 0.86363636] mean value: 0.8818181818181818 key: train_accuracy value: [0.92682927 0.93658537 0.93170732 0.92195122 0.93658537 0.91707317 0.92195122 0.92682927 0.91747573 0.93203883] mean value: 0.9269026758228748 key: test_fscore value: [0.73684211 0.81818182 0.8 0.86956522 0.90909091 0.95652174 0.92307692 0.91666667 0.95238095 0.86956522] mean value: 0.875189154857347 key: train_fscore value: [0.92753623 0.93719807 0.93269231 0.92156863 0.93658537 0.91625616 0.92156863 0.92682927 0.9178744 0.93203883] mean value: 0.9270147884979708 key: test_precision value: [0.875 0.81818182 0.88888889 0.83333333 1. 1. 0.85714286 0.91666667 1. 0.83333333] mean value: 0.9022546897546897 key: train_precision value: [0.92307692 0.93269231 0.92380952 0.93069307 0.93203883 0.92079208 0.92156863 0.9223301 0.91346154 0.93203883] mean value: 0.9252501835996416 key: test_recall value: [0.63636364 0.81818182 0.72727273 0.90909091 0.83333333 0.91666667 1. 0.91666667 0.90909091 0.90909091] mean value: 0.8575757575757575 key: train_recall value: [0.93203883 0.94174757 0.94174757 0.91262136 0.94117647 0.91176471 0.92156863 0.93137255 0.9223301 0.93203883] mean value: 0.9288406624785837 key: test_roc_auc value: [0.77651515 0.82575758 0.8219697 0.87121212 0.91666667 0.95833333 0.90909091 0.91287879 0.95454545 0.86363636] mean value: 0.8810606060606061 key: train_roc_auc value: [0.92680373 0.93656006 0.9316581 0.92199695 0.93660765 0.9170474 0.92194936 0.92685132 0.91747573 0.93203883] mean value: 0.9268989149057681 key: test_jcc value: [0.58333333 0.69230769 0.66666667 0.76923077 0.83333333 0.91666667 0.85714286 0.84615385 0.90909091 0.76923077] mean value: 0.7843156843156843 key: train_jcc value: [0.86486486 0.88181818 0.87387387 0.85454545 0.88073394 0.84545455 0.85454545 0.86363636 0.84821429 0.87272727] mean value: 0.8640414242134425 MCC on Blind test: 0.12 Accuracy on Blind test: 0.56