LSHTM_analysis/scripts/ml/log_embb_8020.txt

19647 lines
966 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_8020.py:549: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 858
PASS: my_features_df and aa_df successfully combined
nrows: 858
ncols: 269
count of NULL values before imputation
or_mychisq 244
log10_or_mychisq 244
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 168
No. of categorical features: 7
PASS: x_features has no target variable
No. of columns for x_features: 175
-------------------------------------------------------------
Successfully split data with stratification: 80/20
Train data size: (358, 175)
Test data size: (90, 175)
y_train numbers: Counter({0: 282, 1: 76})
y_train ratio: 3.710526315789474
y_test_numbers: Counter({0: 71, 1: 19})
y_test ratio: 3.736842105263158
-------------------------------------------------------------
Simple Random OverSampling
Counter({0: 282, 1: 282})
(564, 175)
Simple Random UnderSampling
Counter({0: 76, 1: 76})
(152, 175)
Simple Combined Over and UnderSampling
Counter({0: 282, 1: 282})
(564, 175)
SMOTE_NC OverSampling
Counter({0: 282, 1: 282})
(564, 175)
#####################################################################
Running ML analysis: 80/20 split
Gene name: embB
Drug name: ethambutol
Output directory: /home/tanu/git/Data/ethambutol/output/ml/tts_8020/
Sanity checks:
ML source data size: (448, 175)
Total input features: (358, 175)
Target feature numbers: Counter({0: 282, 1: 76})
Target features ratio: 3.710526315789474
#####################################################################
================================================================
Strucutral features (n): 36
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.0428791 0.04306722 0.03533816 0.03511214 0.036165 0.04105043
0.03672242 0.0365274 0.03594804 0.03573561]
mean value: 0.037854552268981934
key: score_time
value: [0.0125525 0.01230192 0.0132637 0.01333547 0.01366806 0.01352763
0.0132668 0.01345658 0.01344943 0.01346922]
mean value: 0.013229131698608398
key: test_mcc
value: [0.8174367 0.49365725 0.44883281 0.75134288 0.51785714 0.45374261
0.51785714 0.67857143 0.71842121 0.72019314]
mean value: 0.6117912318227132
key: train_mcc
value: [0.79905267 0.83859776 0.82668723 0.78613568 0.81652347 0.8365424
0.79631634 0.84662994 0.82882139 0.80889737]
mean value: 0.818420423682095
key: test_accuracy
value: [0.94444444 0.86111111 0.83333333 0.91666667 0.83333333 0.83333333
0.83333333 0.88888889 0.91428571 0.91428571]
mean value: 0.8773015873015872
key: train_accuracy
value: [0.93478261 0.94720497 0.94409938 0.93167702 0.94099379 0.94720497
0.93478261 0.95031056 0.94427245 0.9380805 ]
mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
0.9413408841797588
key: test_fscore
value: [0.83333333 0.44444444 0.5 0.76923077 0.625 0.4
0.625 0.75 0.72727273 0.76923077]
mean value: 0.6443512043512043
key: train_fscore
value: [0.83464567 0.86821705 0.85714286 0.81666667 0.84552846 0.864
0.82644628 0.87096774 0.859375 0.84126984]
mean value: 0.8484259566846042
key: test_precision
value: [1. 1. 0.75 1. 0.625 1.
0.625 0.75 1. 0.83333333]
mean value: 0.8583333333333334
key: train_precision
value: [0.9137931 0.93333333 0.93103448 0.94230769 0.94545455 0.94736842
0.94339623 0.96428571 0.93220339 0.92982456]
mean value: 0.9383001470289924
key: test_recall
value: [0.71428571 0.28571429 0.375 0.625 0.625 0.25
0.625 0.75 0.57142857 0.71428571]
mean value: 0.5535714285714286
key: train_recall
value: [0.76811594 0.8115942 0.79411765 0.72058824 0.76470588 0.79411765
0.73529412 0.79411765 0.79710145 0.76811594]
mean value: 0.7747868712702473
key: test_roc_auc
value: [0.85714286 0.64285714 0.66964286 0.8125 0.75892857 0.625
0.75892857 0.83928571 0.78571429 0.83928571]
mean value: 0.7589285714285714
key: train_roc_auc
value: [0.87417655 0.89789196 0.88918481 0.85438861 0.87644743 0.89115331
0.86174155 0.89312182 0.89067671 0.87618396]
mean value: 0.8804966692724209
key: test_jcc
value: [0.71428571 0.28571429 0.33333333 0.625 0.45454545 0.25
0.45454545 0.6 0.57142857 0.625 ]
mean value: 0.49138528138528137
key: train_jcc
value: [0.71621622 0.76712329 0.75 0.69014085 0.73239437 0.76056338
0.70422535 0.77142857 0.75342466 0.7260274 ]
mean value: 0.7371544073772512
MCC on Blind test: 0.76
Accuracy on Blind test: 0.92
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.90058208 0.80069208 0.91843367 0.97248459 0.82589054 0.95298767
0.91078734 0.78506351 1.01478314 0.77273583]
mean value: 0.8854440450668335
key: score_time
value: [0.0133543 0.01355958 0.0135107 0.01562595 0.01443386 0.01595211
0.01460838 0.01361895 0.01384115 0.01354361]
mean value: 0.014204859733581543
key: test_mcc
value: [0.75032247 0.61369649 0.65737574 0.75134288 0.41267736 0.45374261
0.67857143 0.77151675 0.61237244 0.61237244]
mean value: 0.6313990594842422
key: train_mcc
value: [0.85829157 0.96310935 0.98135711 0.96271422 0.97192696 0.89565519
0.96271422 1. 0.89735962 0.96314048]
mean value: 0.9456268716050406
key: test_accuracy
value: [0.91666667 0.88888889 0.88888889 0.91666667 0.80555556 0.83333333
0.88888889 0.91666667 0.88571429 0.88571429]
mean value: 0.8826984126984126
key: train_accuracy
value: [0.95341615 0.98757764 0.99378882 0.98757764 0.99068323 0.96583851
0.98757764 1. 0.96594427 0.9876161 ]
mean value: 0.982001999884622
key: test_fscore
value: [0.8 0.6 0.71428571 0.76923077 0.53333333 0.4
0.75 0.82352941 0.6 0.66666667]
mean value: 0.665704589528119
key: train_fscore
value: [0.88549618 0.97101449 0.98529412 0.97058824 0.97777778 0.91603053
0.97058824 1. 0.91851852 0.97101449]
mean value: 0.9566322587596089
key: test_precision
value: [0.75 1. 0.83333333 1. 0.57142857 1.
0.75 0.77777778 1. 0.8 ]
mean value: 0.8482539682539683
key: train_precision
value: [0.93548387 0.97101449 0.98529412 0.97058824 0.98507463 0.95238095
0.97058824 1. 0.93939394 0.97101449]
mean value: 0.9680832963350846
key: test_recall
value: [0.85714286 0.42857143 0.625 0.625 0.5 0.25
0.75 0.875 0.42857143 0.57142857]
mean value: 0.5910714285714286
key: train_recall
value: [0.84057971 0.97101449 0.98529412 0.97058824 0.97058824 0.88235294
0.97058824 1. 0.89855072 0.97101449]
mean value: 0.9460571184995737
key: test_roc_auc
value: [0.89408867 0.71428571 0.79464286 0.8125 0.69642857 0.625
0.83928571 0.90178571 0.71428571 0.76785714]
mean value: 0.7760160098522167
key: train_roc_auc
value: [0.91238472 0.98155468 0.99067855 0.98135711 0.98332561 0.93527096
0.98135711 1. 0.94140135 0.98157024]
mean value: 0.968890032593287
key: test_jcc
value: [0.66666667 0.42857143 0.55555556 0.625 0.36363636 0.25
0.6 0.7 0.42857143 0.5 ]
mean value: 0.5118001443001443
key: train_jcc
value: [0.79452055 0.94366197 0.97101449 0.94285714 0.95652174 0.84507042
0.94285714 1. 0.84931507 0.94366197]
mean value: 0.9189480500233883
MCC on Blind test: 0.8
Accuracy on Blind test: 0.93
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.0145061 0.01411271 0.00958824 0.00958681 0.01035023 0.00946975
0.00950241 0.01032686 0.01084828 0.00958943]
mean value: 0.01078808307647705
key: score_time
value: [0.01501393 0.01024151 0.00904894 0.00896454 0.00931573 0.00904608
0.00889969 0.00926065 0.0098412 0.00942135]
mean value: 0.009905362129211425
key: test_mcc
value: [0.40804713 0.34527065 0.5157267 0.6172134 0.3086067 0.2438548
0.29366622 0.40089186 0.40147753 0.68640647]
mean value: 0.4221161472620327
key: train_mcc
value: [0.62471066 0.5915192 0.67760901 0.65520113 0.64418833 0.66117244
0.66633852 0.52011895 0.60022186 0.64725803]
mean value: 0.62883381442086
key: test_accuracy
value: [0.69444444 0.80555556 0.80555556 0.86111111 0.75 0.75
0.69444444 0.66666667 0.68571429 0.88571429]
mean value: 0.7599206349206349
key: train_accuracy
value: [0.85093168 0.83850932 0.87267081 0.86335404 0.8757764 0.86956522
0.86645963 0.72981366 0.79566563 0.85758514]
mean value: 0.8420331519335423
key: test_fscore
value: [0.52173913 0.46153846 0.63157895 0.70588235 0.47058824 0.4
0.47619048 0.53846154 0.52173913 0.75 ]
mean value: 0.5477718272663756
key: train_fscore
value: [0.70731707 0.68292683 0.74534161 0.72839506 0.72222222 0.73417722
0.73619632 0.60273973 0.67 0.72289157]
mean value: 0.705220762779721
key: test_precision
value: [0.375 0.5 0.54545455 0.66666667 0.44444444 0.42857143
0.38461538 0.38888889 0.375 0.66666667]
mean value: 0.4775308025308025
key: train_precision
value: [0.61052632 0.58947368 0.64516129 0.62765957 0.68421053 0.64444444
0.63157895 0.43708609 0.51145038 0.6185567 ]
mean value: 0.6000147958344869
key: test_recall
value: [0.85714286 0.42857143 0.75 0.75 0.5 0.375
0.625 0.875 0.85714286 0.85714286]
mean value: 0.6875
key: train_recall
value: [0.84057971 0.8115942 0.88235294 0.86764706 0.76470588 0.85294118
0.88235294 0.97058824 0.97101449 0.86956522]
mean value: 0.8713341858482523
key: test_roc_auc
value: [0.75615764 0.66256158 0.78571429 0.82142857 0.66071429 0.61607143
0.66964286 0.74107143 0.75 0.875 ]
mean value: 0.7338362068965517
key: train_roc_auc
value: [0.84716733 0.828722 0.87621584 0.86492589 0.83510885 0.86347846
0.87227883 0.81797128 0.85952299 0.86194796]
mean value: 0.8527339442515047
key: test_jcc
value: [0.35294118 0.3 0.46153846 0.54545455 0.30769231 0.25
0.3125 0.36842105 0.35294118 0.6 ]
mean value: 0.385148872025807
key: train_jcc
value: [0.54716981 0.51851852 0.59405941 0.57281553 0.56521739 0.58
0.58252427 0.43137255 0.5037594 0.56603774]
mean value: 0.5461474616274362
MCC on Blind test: 0.53
Accuracy on Blind test: 0.82
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01099086 0.01003337 0.010741 0.01009178 0.01001143 0.01029563
0.01055908 0.01037931 0.00990558 0.00981903]
mean value: 0.01028270721435547
key: score_time
value: [0.00966334 0.00921583 0.00926971 0.009969 0.00917459 0.00983572
0.00905895 0.0090332 0.00921869 0.00948954]
mean value: 0.009392857551574707
key: test_mcc
value: [ 0.75032247 0.2085873 0.16205093 0.47809144 0.58149992 -0.18898224
0.35714286 0.0805823 0.49391458 0.10206207]
mean value: 0.30252716321584283
key: train_mcc
value: [0.39769343 0.45897008 0.43461577 0.4144431 0.49728141 0.50633817
0.48412839 0.39761525 0.41442016 0.42938548]
mean value: 0.4434891238000965
key: test_accuracy
value: [0.91666667 0.77777778 0.77777778 0.83333333 0.86111111 0.66666667
0.77777778 0.75 0.85714286 0.77142857]
mean value: 0.798968253968254
key: train_accuracy
value: [0.81677019 0.83850932 0.82608696 0.82608696 0.84782609 0.85093168
0.84161491 0.81987578 0.82352941 0.82352941]
mean value: 0.8314760686883449
key: test_fscore
value: [0.8 0.33333333 0.2 0.57142857 0.66666667 0.
0.5 0.18181818 0.54545455 0.2 ]
mean value: 0.3998701298701299
key: train_fscore
value: [0.4957265 0.52727273 0.53333333 0.5 0.57391304 0.57894737
0.57142857 0.49122807 0.50434783 0.52892562]
mean value: 0.5305123055757547
key: test_precision
value: [0.75 0.4 0.5 0.66666667 0.71428571 0.
0.5 0.33333333 0.75 0.33333333]
mean value: 0.49476190476190474
key: train_precision
value: [0.60416667 0.70731707 0.61538462 0.63636364 0.70212766 0.7173913
0.66666667 0.60869565 0.63043478 0.61538462]
mean value: 0.6503932672341834
key: test_recall
value: [0.85714286 0.28571429 0.125 0.5 0.625 0.
0.5 0.125 0.42857143 0.14285714]
mean value: 0.35892857142857143
key: train_recall
value: [0.42028986 0.42028986 0.47058824 0.41176471 0.48529412 0.48529412
0.5 0.41176471 0.42028986 0.46376812]
mean value: 0.44893435635123613
key: test_roc_auc
value: [0.89408867 0.591133 0.54464286 0.71428571 0.77678571 0.42857143
0.67857143 0.52678571 0.69642857 0.53571429]
mean value: 0.6387007389162562
key: train_roc_auc
value: [0.67259552 0.68642951 0.69592404 0.67438629 0.715088 0.71705651
0.71653543 0.67044928 0.67668036 0.69251398]
mean value: 0.6917658928125731
key: test_jcc
value: [0.66666667 0.2 0.11111111 0.4 0.5 0.
0.33333333 0.1 0.375 0.11111111]
mean value: 0.2797222222222222
key: train_jcc
value: [0.32954545 0.35802469 0.36363636 0.33333333 0.40243902 0.40740741
0.4 0.3255814 0.3372093 0.35955056]
mean value: 0.3616727534142999
MCC on Blind test: 0.38
Accuracy on Blind test: 0.81
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00918794 0.0107317 0.01008606 0.0098629 0.00994635 0.00929689
0.01015115 0.01002431 0.01017118 0.01010108]
mean value: 0.009955954551696778
key: score_time
value: [0.08292389 0.01307893 0.01189065 0.01137209 0.01231623 0.0125916
0.01275134 0.01540232 0.01305771 0.01362872]
mean value: 0.019901347160339356
key: test_mcc
value: [-0.08304548 0.34404556 -0.12964074 0.45374261 0.45374261 -0.09035079
0.32232919 0.31622777 -0.08574929 -0.08574929]
mean value: 0.1415552123996879
key: train_mcc
value: [0.49666776 0.39869846 0.50648694 0.43431192 0.43289908 0.46108514
0.37190677 0.38709528 0.47029901 0.47029901]
mean value: 0.4429749369850348
key: test_accuracy
value: [0.77777778 0.83333333 0.72222222 0.83333333 0.83333333 0.75
0.80555556 0.80555556 0.77142857 0.77142857]
mean value: 0.7903968253968254
key: train_accuracy
value: [0.85093168 0.82919255 0.85403727 0.83850932 0.83850932 0.8447205
0.82608696 0.82919255 0.84520124 0.84520124]
mean value: 0.8401582601003789
key: test_fscore
value: [0. 0.25 0. 0.4 0.4 0.
0.36363636 0.22222222 0. 0. ]
mean value: 0.16358585858585858
key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[0.51020408 0.38202247 0.48351648 0.40909091 0.42222222 0.46808511
0.33333333 0.36781609 0.47916667 0.47916667]
mean value: 0.4334624033376049
key: test_precision
value: [0. 1. 0. 1. 1. 0.
0.66666667 1. 0. 0. ]
mean value: 0.4666666666666667
key: train_precision
value: [0.86206897 0.85 0.95652174 0.9 0.86363636 0.84615385
0.875 0.84210526 0.85185185 0.85185185]
mean value: 0.8699189881299484
key: test_recall
value: [0. 0.14285714 0. 0.25 0.25 0.
0.25 0.125 0. 0. ]
mean value: 0.10178571428571428
key: train_recall
value: [0.36231884 0.24637681 0.32352941 0.26470588 0.27941176 0.32352941
0.20588235 0.23529412 0.33333333 0.33333333]
mean value: 0.290771526001705
key: test_roc_auc
value: [0.48275862 0.57142857 0.46428571 0.625 0.625 0.48214286
0.60714286 0.5625 0.48214286 0.48214286]
mean value: 0.538454433497537
key: train_roc_auc
value: [0.67325428 0.61725955 0.6597962 0.62841593 0.63380037 0.65389069
0.59900417 0.61174155 0.65879265 0.65879265]
mean value: 0.6394748047362482
key: test_jcc
value: [0. 0.14285714 0. 0.25 0.25 0.
0.22222222 0.125 0. 0. ]
mean value: 0.0990079365079365
key: train_jcc
value: [0.34246575 0.23611111 0.31884058 0.25714286 0.26760563 0.30555556
0.2 0.22535211 0.31506849 0.31506849]
mean value: 0.2783210589724569
MCC on Blind test: 0.3
Accuracy on Blind test: 0.81
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01502514 0.01375127 0.01382565 0.01428938 0.01496148 0.01451397
0.01644087 0.01448464 0.01387644 0.01542759]
mean value: 0.014659643173217773
key: score_time
value: [0.01057029 0.01050878 0.01076698 0.01031327 0.01009369 0.01014829
0.01023912 0.0101285 0.01013803 0.01021695]
mean value: 0.010312390327453614
key: test_mcc
value: [0.49365725 0. 0.16205093 0.31622777 0.47809144 0.
0.2362278 0.44883281 0. 0.49236596]
mean value: 0.26274539603801333
key: train_mcc
value: [0.59932645 0.65172653 0.65308612 0.65308612 0.63080736 0.67251176
0.65979056 0.65979056 0.63347284 0.6251464 ]
mean value: 0.6438744675217546
key: test_accuracy
value: [0.86111111 0.80555556 0.77777778 0.80555556 0.83333333 0.77777778
0.77777778 0.83333333 0.8 0.85714286]
mean value: 0.812936507936508
key: train_accuracy
value: [0.8757764 0.89130435 0.89130435 0.89130435 0.88509317 0.89751553
0.89440994 0.89440994 0.88544892 0.88235294]
mean value: 0.8888919870007499
key: test_fscore
value: [0.44444444 0. 0.2 0.22222222 0.57142857 0.
0.33333333 0.5 0. 0.44444444]
mean value: 0.2715873015873016
key: train_fscore
value: [0.6 0.68468468 0.65346535 0.65346535 0.62626263 0.68571429
0.67924528 0.67924528 0.6407767 0.62 ]
mean value: 0.6522859554797765
key: test_precision
value: [1. 0. 0.5 1. 0.66666667 0.
0.5 0.75 0. 1. ]
mean value: 0.5416666666666666
key: train_precision
value: [0.96774194 0.9047619 1. 1. 1. 0.97297297
0.94736842 0.94736842 0.97058824 1. ]
mean value: 0.9710801890618129
key: test_recall
value: [0.28571429 0. 0.125 0.125 0.5 0.
0.25 0.375 0. 0.28571429]
mean value: 0.19464285714285715
key: train_recall
value: [0.43478261 0.55072464 0.48529412 0.48529412 0.45588235 0.52941176
0.52941176 0.52941176 0.47826087 0.44927536]
mean value: 0.49277493606138106
key: test_roc_auc
value: [0.64285714 0.5 0.54464286 0.5625 0.71428571 0.5
0.58928571 0.66964286 0.5 0.64285714]
mean value: 0.5866071428571429
key: train_roc_auc
value: [0.71541502 0.76745718 0.74264706 0.74264706 0.72794118 0.76273738
0.76076887 0.76076887 0.73716193 0.72463768]
mean value: 0.7442182233759956
key: test_jcc
value: [0.28571429 0. 0.11111111 0.125 0.4 0.
0.2 0.33333333 0. 0.28571429]
mean value: 0.1740873015873016
key: train_jcc
value: [0.42857143 0.52054795 0.48529412 0.48529412 0.45588235 0.52173913
0.51428571 0.51428571 0.47142857 0.44927536]
mean value: 0.4846604454765825
MCC on Blind test: 0.41
Accuracy on Blind test: 0.83
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.45310259 1.60784292 1.29070163 1.57894993 1.62366939 1.49906635
1.52056623 1.68461752 1.8801136 1.26551223]
mean value: 1.5404142379760741
key: score_time
value: [0.0127666 0.01392817 0.01249576 0.01368737 0.0171051 0.01363277
0.01388717 0.02197933 0.01406717 0.01296186]
mean value: 0.014651131629943848
key: test_mcc
value: [0.68887476 0.49365725 0.36493797 0.75134288 0.46291005 0.16205093
0.67857143 0.6172134 0.61237244 0.61237244]
mean value: 0.5444303548709625
key: train_mcc
value: [0.98155468 0.99086739 0.98135711 0.98135711 0.97192696 0.97192696
0.98135711 0.9906716 0.98157024 0.9722504 ]
mean value: 0.9804839554759929
key: test_accuracy
value: [0.88888889 0.86111111 0.80555556 0.91666667 0.80555556 0.77777778
0.88888889 0.86111111 0.88571429 0.88571429]
mean value: 0.8576984126984127
key: train_accuracy
value: [0.99378882 0.99689441 0.99378882 0.99378882 0.99068323 0.99068323
0.99378882 0.99689441 0.99380805 0.99071207]
mean value: 0.993483068284522
key: test_fscore
value: [0.75 0.44444444 0.46153846 0.76923077 0.58823529 0.2
0.75 0.70588235 0.6 0.66666667]
mean value: 0.5935997988939166
key: train_fscore
value: [0.98550725 0.99280576 0.98529412 0.98529412 0.97777778 0.97777778
0.98529412 0.99259259 0.98550725 0.97810219]
mean value: 0.9845952939019653
key: test_precision
value: [0.66666667 1. 0.6 1. 0.55555556 0.5
0.75 0.66666667 1. 0.8 ]
mean value: 0.7538888888888888
key: train_precision
value: [0.98550725 0.98571429 0.98529412 0.98529412 0.98507463 0.98507463
0.98529412 1. 0.98550725 0.98529412]
mean value: 0.9868054502787488
key: test_recall
value: [0.85714286 0.28571429 0.375 0.625 0.625 0.125
0.75 0.75 0.42857143 0.57142857]
mean value: 0.5392857142857143
key: train_recall
value: [0.98550725 1. 0.98529412 0.98529412 0.97058824 0.97058824
0.98529412 0.98529412 0.98550725 0.97101449]
mean value: 0.9824381926683717
key: test_roc_auc
value: [0.87684729 0.64285714 0.65178571 0.8125 0.74107143 0.54464286
0.83928571 0.82142857 0.71428571 0.76785714]
mean value: 0.741256157635468
key: train_roc_auc
value: [0.99077734 0.99802372 0.99067855 0.99067855 0.98332561 0.98332561
0.99067855 0.99264706 0.99078512 0.98353874]
mean value: 0.9894458866612843
key: test_jcc
value: [0.6 0.28571429 0.3 0.625 0.41666667 0.11111111
0.6 0.54545455 0.42857143 0.5 ]
mean value: 0.44125180375180373
key: train_jcc
value: [0.97142857 0.98571429 0.97101449 0.97101449 0.95652174 0.95652174
0.97101449 0.98529412 0.97142857 0.95714286]
mean value: 0.9697095359883083
MCC on Blind test: 0.72
Accuracy on Blind test: 0.91
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02383637 0.02124 0.01789117 0.01587868 0.0169847 0.01740932
0.01692271 0.01825738 0.01615906 0.01933932]
mean value: 0.018391871452331544
key: score_time
value: [0.01232696 0.010185 0.00895429 0.00890899 0.00881505 0.00891113
0.00893497 0.00903082 0.00903559 0.00908685]
mean value: 0.009418964385986328
key: test_mcc
value: [0.8174367 0.75032247 1. 0.91914503 0.53300179 0.66143783
0.51785714 0.86189161 0.49391458 0.81649658]
mean value: 0.737150373784473
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94444444 0.91666667 1. 0.97222222 0.77777778 0.88888889
0.83333333 0.94444444 0.85714286 0.94285714]
mean value: 0.9077777777777778
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.83333333 0.8 1. 0.93333333 0.63636364 0.66666667
0.625 0.88888889 0.54545455 0.83333333]
mean value: 0.7762373737373737
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.75 1. 1. 0.5 1. 0.625 0.8 0.75 1. ]
mean value: 0.8425
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.71428571 0.85714286 1. 0.875 0.875 0.5
0.625 1. 0.42857143 0.71428571]
mean value: 0.7589285714285714
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.85714286 0.89408867 1. 0.9375 0.8125 0.75
0.75892857 0.96428571 0.69642857 0.85714286]
mean value: 0.8528017241379311
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.71428571 0.66666667 1. 0.875 0.46666667 0.5
0.45454545 0.8 0.375 0.71428571]
mean value: 0.6566450216450217
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.84
Accuracy on Blind test: 0.94
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10711122 0.10529757 0.10800672 0.10813236 0.10637426 0.10711551
0.106498 0.10879707 0.10712099 0.10767961]
mean value: 0.10721333026885986
key: score_time
value: [0.01776719 0.01798534 0.01802421 0.01788712 0.01799774 0.01796985
0.01869559 0.0179739 0.01853013 0.01776457]
mean value: 0.018059563636779786
key: test_mcc
value: [0.71962292 0.1872493 0.65737574 0.66143783 0.56354451 0.16205093
0.58149992 0.58149992 0.34299717 0.71842121]
mean value: 0.5175699436874827
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91666667 0.80555556 0.88888889 0.88888889 0.83333333 0.77777778
0.86111111 0.86111111 0.82857143 0.91428571]
mean value: 0.8576190476190476
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.72727273 0.22222222 0.71428571 0.66666667 0.66666667 0.2
0.66666667 0.66666667 0.25 0.72727273]
mean value: 0.5507720057720058
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.5 0.83333333 1. 0.6 0.5
0.71428571 0.71428571 1. 1. ]
mean value: 0.7861904761904762
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.57142857 0.14285714 0.625 0.5 0.75 0.125
0.625 0.625 0.14285714 0.57142857]
mean value: 0.46785714285714286
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.78571429 0.55418719 0.79464286 0.75 0.80357143 0.54464286
0.77678571 0.77678571 0.57142857 0.78571429]
mean value: 0.7143472906403942
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.57142857 0.125 0.55555556 0.5 0.5 0.11111111
0.5 0.5 0.14285714 0.57142857]
mean value: 0.40773809523809523
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.68
Accuracy on Blind test: 0.9
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00979185 0.01068974 0.01069331 0.00985026 0.0098598 0.00974846
0.01084542 0.00980043 0.01022649 0.01085448]
mean value: 0.010236024856567383
key: score_time
value: [0.00915313 0.00982594 0.00975847 0.00891304 0.00955153 0.00960112
0.0089066 0.00889993 0.00894642 0.00965571]
mean value: 0.009321188926696778
key: test_mcc
value: [ 0.43895468 -0.03138824 0.19642857 0.2438548 0.29366622 0.75134288
0.26519742 0.07503225 0.49391458 0.15161961]
mean value: 0.28786227687707744
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.72222222 0.69444444 0.72222222 0.75 0.69444444 0.91666667
0.72222222 0.69444444 0.85714286 0.74285714]
mean value: 0.7516666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.54545455 0.15384615 0.375 0.4 0.47619048 0.76923077
0.44444444 0.26666667 0.54545455 0.30769231]
mean value: 0.4283979908979909
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.4 0.16666667 0.375 0.42857143 0.38461538 1.
0.4 0.28571429 0.75 0.33333333]
mean value: 0.4523901098901099
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.85714286 0.14285714 0.375 0.375 0.625 0.625
0.5 0.25 0.42857143 0.28571429]
mean value: 0.4464285714285714
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.77339901 0.48522167 0.59821429 0.61607143 0.66964286 0.8125
0.64285714 0.53571429 0.69642857 0.57142857]
mean value: 0.6401477832512316
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.375 0.08333333 0.23076923 0.25 0.3125 0.625
0.28571429 0.15384615 0.375 0.18181818]
mean value: 0.28729811854811854
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.72
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.50880575 1.47889471 1.49357581 1.51851034 1.48310709 1.48366189
1.48690367 1.52164292 1.48334312 1.50404835]
mean value: 1.4962493658065796
key: score_time
value: [0.09659505 0.09937906 0.0998745 0.09763646 0.09497857 0.09289384
0.09455442 0.09473515 0.09559774 0.09992361]
mean value: 0.09661684036254883
key: test_mcc
value: [1. 0.49365725 0.83666003 0.66143783 0.67857143 0.75134288
0.77151675 0.77151675 0.61237244 0.90971765]
mean value: 0.7486793002774902
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.86111111 0.94444444 0.88888889 0.88888889 0.91666667
0.91666667 0.91666667 0.88571429 0.97142857]
mean value: 0.919047619047619
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.44444444 0.85714286 0.66666667 0.75 0.76923077
0.82352941 0.82352941 0.6 0.92307692]
mean value: 0.7657620484091072
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 0.75 1.
0.77777778 0.77777778 1. 1. ]
mean value: 0.9305555555555556
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.28571429 0.75 0.5 0.75 0.625
0.875 0.875 0.42857143 0.85714286]
mean value: 0.6946428571428571
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.64285714 0.875 0.75 0.83928571 0.8125
0.90178571 0.90178571 0.71428571 0.92857143]
mean value: 0.8366071428571429
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.28571429 0.75 0.5 0.6 0.625
0.7 0.7 0.42857143 0.85714286]
mean value: 0.6446428571428571
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.83
Accuracy on Blind test: 0.94
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [1.79229784 0.95148969 0.91167283 0.94181609 0.88025999 0.92065978
0.91286707 0.9315846 0.88310695 0.94630051]
mean value: 1.0072055339813233
key: score_time
value: [0.20599699 0.3146832 0.24151015 0.27133441 0.13036633 0.22741055
0.23978066 0.25486302 0.21415901 0.17490315]
mean value: 0.2275007486343384
key: test_mcc
value: [0.71962292 0.34404556 0.66143783 0.66143783 0.47809144 0.45374261
0.75032247 0.47809144 0.49236596 0.61237244]
mean value: 0.565153049910657
key: train_mcc
value: [0.88709235 0.92532149 0.9244842 0.93401658 0.9244842 0.9244842
0.90518666 0.90534273 0.92534731 0.90647794]
mean value: 0.9162237656285032
key: test_accuracy
value: [0.91666667 0.83333333 0.88888889 0.88888889 0.83333333 0.83333333
0.91666667 0.83333333 0.85714286 0.88571429]
mean value: 0.8687301587301587
key: train_accuracy
value: [0.96273292 0.97515528 0.97515528 0.97826087 0.97515528 0.97515528
0.9689441 0.9689441 0.9752322 0.96904025]
mean value: 0.9723775551410495
key: test_fscore
value: [0.72727273 0.25 0.66666667 0.66666667 0.57142857 0.4
0.8 0.57142857 0.44444444 0.6 ]
mean value: 0.5697907647907648
key: train_fscore
value: [0.90769231 0.93939394 0.93846154 0.94656489 0.93846154 0.93846154
0.92307692 0.921875 0.94029851 0.92307692]
mean value: 0.9317363101583578
key: test_precision
value: [1. 1. 1. 1. 0.66666667 1.
0.85714286 0.66666667 1. 1. ]
mean value: 0.919047619047619
key: train_precision
value: [0.96721311 0.98412698 0.98387097 0.98412698 0.98387097 0.98387097
0.96774194 0.98333333 0.96923077 0.98360656]
mean value: 0.9790992581658896
key: test_recall
value: [0.57142857 0.14285714 0.5 0.5 0.5 0.25
0.75 0.5 0.28571429 0.42857143]
mean value: 0.44285714285714284
key: train_recall
value: [0.85507246 0.89855072 0.89705882 0.91176471 0.89705882 0.89705882
0.88235294 0.86764706 0.91304348 0.86956522]
mean value: 0.8889173060528559
key: test_roc_auc
value: [0.78571429 0.57142857 0.75 0.75 0.71428571 0.625
0.85714286 0.71428571 0.64285714 0.71428571]
mean value: 0.7125
key: train_roc_auc
value: [0.92358366 0.94729908 0.94656091 0.95391385 0.94656091 0.94656091
0.93723946 0.93185503 0.95258473 0.9328141 ]
mean value: 0.941897263713926
key: test_jcc
value: [0.57142857 0.14285714 0.5 0.5 0.4 0.25
0.66666667 0.4 0.28571429 0.42857143]
mean value: 0.4145238095238095
key: train_jcc
value: [0.83098592 0.88571429 0.88405797 0.89855072 0.88405797 0.88405797
0.85714286 0.85507246 0.88732394 0.85714286]
mean value: 0.8724106960604205
MCC on Blind test: 0.76
Accuracy on Blind test: 0.92
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02311754 0.00943446 0.00950122 0.00938201 0.00950313 0.00959158
0.00952291 0.00950456 0.00949812 0.00953507]
mean value: 0.010859060287475585
key: score_time
value: [0.01207566 0.0087018 0.00878215 0.00869799 0.00875616 0.00878167
0.0086906 0.00866127 0.00872397 0.00867581]
mean value: 0.00905470848083496
key: test_mcc
value: [ 0.75032247 0.2085873 0.16205093 0.47809144 0.58149992 -0.18898224
0.35714286 0.0805823 0.49391458 0.10206207]
mean value: 0.30252716321584283
key: train_mcc
value: [0.39769343 0.45897008 0.43461577 0.4144431 0.49728141 0.50633817
0.48412839 0.39761525 0.41442016 0.42938548]
mean value: 0.4434891238000965
key: test_accuracy
value: [0.91666667 0.77777778 0.77777778 0.83333333 0.86111111 0.66666667
0.77777778 0.75 0.85714286 0.77142857]
mean value: 0.798968253968254
key: train_accuracy
value: [0.81677019 0.83850932 0.82608696 0.82608696 0.84782609 0.85093168
0.84161491 0.81987578 0.82352941 0.82352941]
mean value: 0.8314760686883449
key: test_fscore
value: [0.8 0.33333333 0.2 0.57142857 0.66666667 0.
0.5 0.18181818 0.54545455 0.2 ]
mean value: 0.3998701298701299
key: train_fscore
value: [0.4957265 0.52727273 0.53333333 0.5 0.57391304 0.57894737
0.57142857 0.49122807 0.50434783 0.52892562]
mean value: 0.5305123055757547
key: test_precision
value: [0.75 0.4 0.5 0.66666667 0.71428571 0.
0.5 0.33333333 0.75 0.33333333]
mean value: 0.49476190476190474
key: train_precision
value: [0.60416667 0.70731707 0.61538462 0.63636364 0.70212766 0.7173913
0.66666667 0.60869565 0.63043478 0.61538462]
mean value: 0.6503932672341834
key: test_recall
value: [0.85714286 0.28571429 0.125 0.5 0.625 0.
0.5 0.125 0.42857143 0.14285714]
mean value: 0.35892857142857143
key: train_recall
value: [0.42028986 0.42028986 0.47058824 0.41176471 0.48529412 0.48529412
0.5 0.41176471 0.42028986 0.46376812]
mean value: 0.44893435635123613
key: test_roc_auc
value: [0.89408867 0.591133 0.54464286 0.71428571 0.77678571 0.42857143
0.67857143 0.52678571 0.69642857 0.53571429]
mean value: 0.6387007389162562
key: train_roc_auc
value: [0.67259552 0.68642951 0.69592404 0.67438629 0.715088 0.71705651
0.71653543 0.67044928 0.67668036 0.69251398]
mean value: 0.6917658928125731
key: test_jcc
value: [0.66666667 0.2 0.11111111 0.4 0.5 0.
0.33333333 0.1 0.375 0.11111111]
mean value: 0.2797222222222222
key: train_jcc
value: [0.32954545 0.35802469 0.36363636 0.33333333 0.40243902 0.40740741
0.4 0.3255814 0.3372093 0.35955056]
mean value: 0.3616727534142999
MCC on Blind test: 0.38
Accuracy on Blind test: 0.81
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.09641838 0.05527234 0.0620811 0.07439089 0.05781317 0.06530595
0.06510544 0.08454537 0.20302868 0.05387855]
mean value: 0.08178398609161378
key: score_time
value: [0.0110662 0.01033831 0.01064897 0.01097441 0.01094913 0.01161528
0.01144314 0.01093102 0.01129246 0.01139951]
mean value: 0.011065840721130371
key: test_mcc
value: [1. 0.91914503 1. 0.91914503 0.80582296 0.91914503
0.9258201 0.86189161 0.72019314 0.90971765]
mean value: 0.8980880554918083
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.97222222 1. 0.97222222 0.91666667 0.97222222
0.97222222 0.94444444 0.91428571 0.97142857]
mean value: 0.9635714285714285
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.93333333 1. 0.93333333 0.84210526 0.93333333
0.94117647 0.88888889 0.76923077 0.92307692]
mean value: 0.9164478314942711
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.875 1. 1. 0.72727273 1.
0.88888889 0.8 0.83333333 1. ]
mean value: 0.912449494949495
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 0.875 1. 0.875
1. 1. 0.71428571 0.85714286]
mean value: 0.9321428571428572
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.98275862 1. 0.9375 0.94642857 0.9375
0.98214286 0.96428571 0.83928571 0.92857143]
mean value: 0.9518472906403941
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.875 1. 0.875 0.72727273 0.875
0.88888889 0.8 0.625 0.85714286]
mean value: 0.8523304473304474
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.91
Accuracy on Blind test: 0.97
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04570127 0.09128404 0.06899452 0.07317495 0.06892586 0.06894684
0.07744288 0.03692055 0.05794311 0.05879235]
mean value: 0.06481263637542725
key: score_time
value: [0.01260543 0.01506281 0.02454662 0.02456522 0.02190804 0.01689649
0.01267576 0.01267529 0.01691985 0.01226139]
mean value: 0.017011690139770507
key: test_mcc
value: [0.72192954 0.6453202 0.55814043 0.5157267 0.66077483 0.67857143
0.58149992 0.6172134 0.81649658 0.72019314]
mean value: 0.6515866163842986
key: train_mcc
value: [0.9358192 0.9358192 0.93513953 0.93513953 0.93513953 0.93513953
0.93513953 0.95318232 0.93587381 0.92628095]
mean value: 0.9362673146575943
key: test_accuracy
value: [0.91666667 0.88888889 0.86111111 0.80555556 0.86111111 0.88888889
0.86111111 0.86111111 0.94285714 0.91428571]
mean value: 0.8801587301587301
key: train_accuracy
value: [0.97826087 0.97826087 0.97826087 0.97826087 0.97826087 0.97826087
0.97826087 0.98447205 0.97832817 0.9752322 ]
mean value: 0.9785858508162991
key: test_fscore
value: [0.76923077 0.71428571 0.61538462 0.63157895 0.73684211 0.75
0.66666667 0.70588235 0.83333333 0.76923077]
mean value: 0.7192435273704624
key: train_fscore
value: [0.94964029 0.94964029 0.94890511 0.94890511 0.94890511 0.94890511
0.94890511 0.96296296 0.94964029 0.94202899]
mean value: 0.9498438359224818
key: test_precision
value: [0.83333333 0.71428571 0.8 0.54545455 0.63636364 0.75
0.71428571 0.66666667 1. 0.83333333]
mean value: 0.7493722943722944
key: train_precision
value: [0.94285714 0.94285714 0.94202899 0.94202899 0.94202899 0.94202899
0.94202899 0.97014925 0.94285714 0.94202899]
mean value: 0.945089459534625
key: test_recall
value: [0.71428571 0.71428571 0.5 0.75 0.875 0.75
0.625 0.75 0.71428571 0.71428571]
mean value: 0.7107142857142857
key: train_recall
value: [0.95652174 0.95652174 0.95588235 0.95588235 0.95588235 0.95588235
0.95588235 0.95588235 0.95652174 0.94202899]
mean value: 0.954688832054561
key: test_roc_auc
value: [0.83990148 0.8226601 0.73214286 0.78571429 0.86607143 0.83928571
0.77678571 0.82142857 0.85714286 0.83928571]
mean value: 0.8180418719211823
key: train_roc_auc
value: [0.97035573 0.97035573 0.97006716 0.97006716 0.97006716 0.97006716
0.97006716 0.97400417 0.97038685 0.96314048]
mean value: 0.9698578765482728
key: test_jcc
value: [0.625 0.55555556 0.44444444 0.46153846 0.58333333 0.6
0.5 0.54545455 0.71428571 0.625 ]
mean value: 0.5654612054612055
key: train_jcc
value: [0.90410959 0.90410959 0.90277778 0.90277778 0.90277778 0.90277778
0.90277778 0.92857143 0.90410959 0.89041096]
mean value: 0.9045200043487714
MCC on Blind test: 0.68
Accuracy on Blind test: 0.89
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01387548 0.01329589 0.00991011 0.01070857 0.00993299 0.00958133
0.01013231 0.00957942 0.00979066 0.00977588]
mean value: 0.01065826416015625
key: score_time
value: [0.01285505 0.01248264 0.00921583 0.00976205 0.0090754 0.00870442
0.00901771 0.0088408 0.00888538 0.00878716]
mean value: 0.00976264476776123
key: test_mcc
value: [0.49629167 0.2085873 0.66143783 0.55814043 0.35714286 0.44883281
0.46291005 0.17173552 0.61237244 0.64285714]
mean value: 0.4620308033163919
key: train_mcc
value: [0.53438367 0.60265353 0.54665085 0.50633817 0.57652074 0.53118814
0.55255244 0.57652074 0.53762725 0.53762725]
mean value: 0.5502062784000151
key: test_accuracy
value: [0.86111111 0.77777778 0.88888889 0.86111111 0.77777778 0.83333333
0.80555556 0.75 0.88571429 0.88571429]
mean value: 0.8326984126984127
key: train_accuracy
value: [0.85714286 0.8757764 0.86024845 0.85093168 0.86956522 0.85714286
0.86335404 0.86956522 0.85758514 0.85758514]
mean value: 0.8618896986712306
key: test_fscore
value: [0.54545455 0.33333333 0.66666667 0.61538462 0.5 0.5
0.58823529 0.30769231 0.6 0.71428571]
mean value: 0.537105247693483
key: train_fscore
value: [0.60344828 0.66666667 0.62184874 0.57894737 0.6440678 0.60344828
0.62068966 0.6440678 0.61016949 0.61016949]
mean value: 0.6203523557751256
key: test_precision
value: [0.75 0.4 1. 0.8 0.5 0.75
0.55555556 0.4 1. 0.71428571]
mean value: 0.686984126984127
key: train_precision
value: [0.74468085 0.78431373 0.7254902 0.7173913 0.76 0.72916667
0.75 0.76 0.73469388 0.73469388]
mean value: 0.7440430498748991
key: test_recall
value: [0.42857143 0.28571429 0.5 0.5 0.5 0.375
0.625 0.25 0.42857143 0.71428571]
mean value: 0.4607142857142857
key: train_recall
value: [0.50724638 0.57971014 0.54411765 0.48529412 0.55882353 0.51470588
0.52941176 0.55882353 0.52173913 0.52173913]
mean value: 0.5321611253196931
key: test_roc_auc
value: [0.69704433 0.591133 0.75 0.73214286 0.67857143 0.66964286
0.74107143 0.57142857 0.71428571 0.82142857]
mean value: 0.6966748768472907
key: train_roc_auc
value: [0.72990777 0.76811594 0.74449977 0.71705651 0.75578972 0.73176239
0.74108384 0.75578972 0.73527901 0.73527901]
mean value: 0.7414563679569117
key: test_jcc
value: [0.375 0.2 0.5 0.44444444 0.33333333 0.33333333
0.41666667 0.18181818 0.42857143 0.55555556]
mean value: 0.37687229437229436
key: train_jcc
value: [0.43209877 0.5 0.45121951 0.40740741 0.475 0.43209877
0.45 0.475 0.43902439 0.43902439]
mean value: 0.4500873230954532
MCC on Blind test: 0.56
Accuracy on Blind test: 0.87
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01311135 0.01836658 0.01801801 0.02214932 0.0201776 0.02150178
0.03540277 0.02377319 0.01945472 0.02329373]
mean value: 0.021524906158447266
key: score_time
value: [0.00971651 0.01126003 0.01196456 0.01231813 0.01482916 0.0148766
0.01425123 0.01228762 0.01243329 0.01276159]
mean value: 0.012669873237609864
key: test_mcc
value: [0.85096294 0.72192954 0.44883281 0.91914503 0.41267736 0.31622777
0.37067856 0.6172134 0.61237244 0.72019314]
mean value: 0.5990232995602679
key: train_mcc
value: [0.84986344 0.90206627 0.77744561 0.94588078 0.86625969 0.8365424
0.93513953 0.94378174 0.81007791 0.92534731]
mean value: 0.8792404687832709
key: test_accuracy
value: [0.94444444 0.91666667 0.83333333 0.97222222 0.80555556 0.80555556
0.75 0.86111111 0.88571429 0.91428571]
mean value: 0.8688888888888889
key: train_accuracy
value: [0.95031056 0.96583851 0.92857143 0.98136646 0.95652174 0.94720497
0.97826087 0.98136646 0.9380805 0.9752322 ]
mean value: 0.9602753687287272
key: test_fscore
value: [0.875 0.76923077 0.5 0.93333333 0.53333333 0.22222222
0.52631579 0.70588235 0.6 0.76923077]
mean value: 0.6434548569765288
key: train_fscore
value: [0.88059701 0.92307692 0.8 0.95714286 0.890625 0.864
0.94890511 0.95384615 0.83333333 0.94029851]
mean value: 0.8991824899276378
key: test_precision
value: [0.77777778 0.83333333 0.75 1. 0.57142857 1.
0.45454545 0.66666667 1. 0.83333333]
mean value: 0.7887085137085137
key: train_precision
value: [0.90769231 0.89189189 0.9787234 0.93055556 0.95 0.94736842
0.94202899 1. 0.98039216 0.96923077]
mean value: 0.9497883492048467
key: test_recall
value: [1. 0.71428571 0.375 0.875 0.5 0.125
0.625 0.75 0.42857143 0.71428571]
mean value: 0.6107142857142858
key: train_recall
value: [0.85507246 0.95652174 0.67647059 0.98529412 0.83823529 0.79411765
0.95588235 0.91176471 0.72463768 0.91304348]
mean value: 0.8611040068201193
key: test_roc_auc
value: [0.96551724 0.83990148 0.66964286 0.9375 0.69642857 0.5625
0.70535714 0.82142857 0.71428571 0.83928571]
mean value: 0.7751847290640395
key: train_roc_auc
value: [0.91567852 0.96245059 0.83626679 0.98280454 0.91321214 0.89115331
0.97006716 0.95588235 0.86035034 0.95258473]
mean value: 0.9240450475107724
key: test_jcc
value: [0.77777778 0.625 0.33333333 0.875 0.36363636 0.125
0.35714286 0.54545455 0.42857143 0.625 ]
mean value: 0.5055916305916306
key: train_jcc
value: [0.78666667 0.85714286 0.66666667 0.91780822 0.8028169 0.76056338
0.90277778 0.91176471 0.71428571 0.88732394]
mean value: 0.820781683295223
MCC on Blind test: 0.72
Accuracy on Blind test: 0.91
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01766181 0.01630974 0.0205574 0.01550651 0.01691341 0.01607966
0.01694655 0.01806903 0.01713586 0.01994872]
mean value: 0.017512869834899903
key: score_time
value: [0.0132618 0.01240635 0.01228547 0.0124495 0.01213622 0.01185417
0.01222014 0.01217818 0.01196337 0.01226044]
mean value: 0.01230156421661377
key: test_mcc
value: [0.58131836 0.34404556 0.67005939 0.75134288 0.51785714 0.5976143
0.50560765 0.51785714 0.35478744 0.61237244]
mean value: 0.5452862312081964
key: train_mcc
value: [0.77957604 0.49277338 0.66452587 0.84671817 0.85667348 0.43962631
0.72241165 0.87641313 0.69158946 0.89676152]
mean value: 0.7267069018892041
key: test_accuracy
value: [0.77777778 0.83333333 0.83333333 0.91666667 0.83333333 0.77777778
0.69444444 0.83333333 0.71428571 0.88571429]
mean value: 0.81
key: train_accuracy
value: [0.91614907 0.84782609 0.83850932 0.95031056 0.95341615 0.63043478
0.8757764 0.95962733 0.85139319 0.96594427]
mean value: 0.8789387150741304
key: test_fscore
value: [0.63636364 0.25 0.72727273 0.76923077 0.625 0.66666667
0.59259259 0.625 0.5 0.66666667]
mean value: 0.6058793058793058
key: train_fscore
value: [0.82580645 0.4494382 0.72043011 0.875 0.88372093 0.53333333
0.77011494 0.896 0.74193548 0.91603053]
mean value: 0.7611809985703716
key: test_precision
value: [0.46666667 1. 0.57142857 1. 0.625 0.5
0.42105263 0.625 0.38461538 0.8 ]
mean value: 0.6393763254289571
key: train_precision
value: [0.74418605 1. 0.56779661 0.93333333 0.93442623 0.36363636
0.63207547 0.98245614 0.58974359 0.96774194]
mean value: 0.7715395720435464
key: test_recall
value: [1. 0.14285714 1. 0.625 0.625 1.
1. 0.625 0.71428571 0.57142857]
mean value: 0.7303571428571428
key: train_recall
value: [0.92753623 0.28985507 0.98529412 0.82352941 0.83823529 1.
0.98529412 0.82352941 1. 0.86956522]
mean value: 0.8542838874680307
key: test_roc_auc
value: [0.86206897 0.57142857 0.89285714 0.8125 0.75892857 0.85714286
0.80357143 0.75892857 0.71428571 0.76785714]
mean value: 0.7799568965517242
key: train_roc_auc
value: [0.92028986 0.64492754 0.89225336 0.90389069 0.91124363 0.76574803
0.91587541 0.9097962 0.90551181 0.9308456 ]
mean value: 0.8700382121352478
key: test_jcc
value: [0.46666667 0.14285714 0.57142857 0.625 0.45454545 0.5
0.42105263 0.45454545 0.33333333 0.5 ]
mean value: 0.4469429254955571
key: train_jcc
value: [0.7032967 0.28985507 0.56302521 0.77777778 0.79166667 0.36363636
0.62616822 0.8115942 0.58974359 0.84507042]
mean value: 0.6361834233401731
MCC on Blind test: 0.53
Accuracy on Blind test: 0.79
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.15819955 0.14830995 0.14526677 0.13937092 0.14099097 0.13902879
0.1481483 0.1477356 0.14789319 0.14681125]
mean value: 0.14617552757263183
key: score_time
value: [0.01697183 0.01665115 0.01544809 0.01664472 0.01590967 0.0155158
0.01666117 0.0167737 0.01641393 0.01669216]
mean value: 0.0163682222366333
key: test_mcc
value: [1. 0.91914503 0.91914503 0.91914503 0.80582296 0.75134288
0.9258201 0.86189161 0.49391458 0.81649658]
mean value: 0.8412723806521708
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.97222222 0.97222222 0.97222222 0.91666667 0.91666667
0.97222222 0.94444444 0.85714286 0.94285714]
mean value: 0.9466666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.93333333 0.93333333 0.93333333 0.84210526 0.76923077
0.94117647 0.88888889 0.54545455 0.83333333]
mean value: 0.8620189270653666
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.875 1. 1. 0.72727273 1.
0.88888889 0.8 0.75 1. ]
mean value: 0.9041161616161616
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.875 0.875 1. 0.625
1. 1. 0.42857143 0.71428571]
mean value: 0.8517857142857143
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.98275862 0.9375 0.9375 0.94642857 0.8125
0.98214286 0.96428571 0.69642857 0.85714286]
mean value: 0.9116687192118227
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.875 0.875 0.875 0.72727273 0.625
0.88888889 0.8 0.375 0.71428571]
mean value: 0.775544733044733
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.94
Accuracy on Blind test: 0.98
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05724621 0.0500145 0.0438199 0.07104373 0.0596118 0.06134391
0.06512403 0.070961 0.0410428 0.05038881]
mean value: 0.05705966949462891
key: score_time
value: [0.02019477 0.02353168 0.0291779 0.04015303 0.0278511 0.03402472
0.0242095 0.04008889 0.02148032 0.02989411]
mean value: 0.02906060218811035
key: test_mcc
value: [1. 0.91914503 1. 0.91914503 0.80582296 0.83666003
0.77151675 0.86189161 0.49391458 0.81649658]
mean value: 0.8424592569278723
key: train_mcc
value: [0.98155468 0.98155468 0.97192696 0.99077106 0.99077106 0.9906716
0.97196923 0.97192696 0.96368577 1. ]
mean value: 0.981483200152367
key: test_accuracy
value: [1. 0.97222222 1. 0.97222222 0.91666667 0.94444444
0.91666667 0.94444444 0.85714286 0.94285714]
mean value: 0.9466666666666667
key: train_accuracy
value: [0.99378882 0.99378882 0.99068323 0.99689441 0.99689441 0.99689441
0.99068323 0.99068323 0.9876161 1. ]
mean value: 0.9937926658077418
key: test_fscore
value: [1. 0.93333333 1. 0.93333333 0.84210526 0.85714286
0.82352941 0.88888889 0.54545455 0.83333333]
mean value: 0.8657120966408892
key: train_fscore
value: [0.98550725 0.98550725 0.97777778 0.99270073 0.99270073 0.99259259
0.97744361 0.97777778 0.97142857 1. ]
mean value: 0.9853436281206914
key: test_precision
value: [1. 0.875 1. 1. 0.72727273 1.
0.77777778 0.8 0.75 1. ]
mean value: 0.8930050505050505
key: train_precision
value: [0.98550725 0.98550725 0.98507463 0.98550725 0.98550725 1.
1. 0.98507463 0.95774648 1. ]
mean value: 0.9869924718111829
key: test_recall
value: [1. 1. 1. 0.875 1. 0.75
0.875 1. 0.42857143 0.71428571]
mean value: 0.8642857142857143
key: train_recall
value: [0.98550725 0.98550725 0.97058824 1. 1. 0.98529412
0.95588235 0.97058824 0.98550725 1. ]
mean value: 0.9838874680306906
key: test_roc_auc
value: [1. 0.98275862 1. 0.9375 0.94642857 0.875
0.90178571 0.96428571 0.69642857 0.85714286]
mean value: 0.9161330049261084
key: train_roc_auc
value: [0.99077734 0.99077734 0.98332561 0.9980315 0.9980315 0.99264706
0.97794118 0.98332561 0.98684811 1. ]
mean value: 0.9901705243424437
key: test_jcc
value: [1. 0.875 1. 0.875 0.72727273 0.75
0.7 0.8 0.375 0.71428571]
mean value: 0.7816558441558441
key: train_jcc
value: [0.97142857 0.97142857 0.95652174 0.98550725 0.98550725 0.98529412
0.95588235 0.95652174 0.94444444 1. ]
mean value: 0.9712536028904315
MCC on Blind test: 0.87
Accuracy on Blind test: 0.96
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.08166194 0.14968729 0.11179137 0.07614708 0.08787417 0.1285069
0.12228823 0.12631869 0.09615564 0.09222722]
mean value: 0.10726585388183593
key: score_time
value: [0.02172256 0.02620792 0.02364445 0.01462293 0.022928 0.02531552
0.02583742 0.02750945 0.02175593 0.02795911]
mean value: 0.02375032901763916
key: test_mcc
value: [ 0.1872493 0.1872493 -0.09035079 0. -0.12964074 0.
0.2362278 0.45374261 -0.08574929 0. ]
mean value: 0.07587281768515772
key: train_mcc
value: [0.91634855 0.93507164 0.93434457 0.90588785 0.89634849 0.91539921
0.93434457 0.90588785 0.90701894 0.91641052]
mean value: 0.9167062185712945
key: test_accuracy
value: [0.80555556 0.80555556 0.75 0.77777778 0.72222222 0.77777778
0.77777778 0.83333333 0.77142857 0.8 ]
mean value: 0.7821428571428571
key: train_accuracy
value: [0.97204969 0.97826087 0.97826087 0.9689441 0.96583851 0.97204969
0.97826087 0.9689441 0.96904025 0.97213622]
mean value: 0.972378516624041
key: test_fscore
value: [0.22222222 0.22222222 0. 0. 0. 0.
0.33333333 0.4 0. 0. ]
mean value: 0.11777777777777779
key: train_fscore
value: [0.93023256 0.94656489 0.94573643 0.92063492 0.912 0.92913386
0.94573643 0.92063492 0.921875 0.93023256]
mean value: 0.9302781569529865
key: test_precision
value: [0.5 0.5 0. 0. 0. 0. 0.5 1. 0. 0. ]
mean value: 0.25
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.14285714 0.14285714 0. 0. 0. 0.
0.25 0.25 0. 0. ]
mean value: 0.07857142857142857
key: train_recall
value: [0.86956522 0.89855072 0.89705882 0.85294118 0.83823529 0.86764706
0.89705882 0.85294118 0.85507246 0.86956522]
mean value: 0.8698635976129583
key: test_roc_auc
value: [0.55418719 0.55418719 0.48214286 0.5 0.46428571 0.5
0.58928571 0.625 0.48214286 0.5 ]
mean value: 0.5251231527093596
key: train_roc_auc
value: [0.93478261 0.94927536 0.94852941 0.92647059 0.91911765 0.93382353
0.94852941 0.92647059 0.92753623 0.93478261]
mean value: 0.9349317988064791
key: test_jcc
value: [0.125 0.125 0. 0. 0. 0. 0.2 0.25 0. 0. ]
mean value: 0.07
key: train_jcc
value: [0.86956522 0.89855072 0.89705882 0.85294118 0.83823529 0.86764706
0.89705882 0.85294118 0.85507246 0.86956522]
mean value: 0.8698635976129583
MCC on Blind test: 0.11
Accuracy on Blind test: 0.78
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.52069354 0.50328159 0.50455189 0.50025082 0.50435662 0.50333905
0.49607444 0.50226235 0.50641394 0.49668527]
mean value: 0.5037909507751465
key: score_time
value: [0.00986624 0.00982571 0.01003551 0.00934696 0.00967073 0.00945807
0.0095005 0.00956893 0.01015067 0.01010966]
mean value: 0.00975329875946045
key: test_mcc
value: [1. 0.8226601 1. 0.91914503 0.71098137 0.75134288
0.86189161 0.86189161 0.61237244 1. ]
mean value: 0.8540285030671064
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.94444444 1. 0.97222222 0.86111111 0.91666667
0.94444444 0.94444444 0.88571429 1. ]
mean value: 0.9469047619047619
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.85714286 1. 0.93333333 0.76190476 0.76923077
0.88888889 0.88888889 0.66666667 1. ]
mean value: 0.8766056166056166
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.85714286 1. 1. 0.61538462 1.
0.8 0.8 0.8 1. ]
mean value: 0.8872527472527473
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.85714286 1. 0.875 1. 0.625
1. 1. 0.57142857 1. ]
mean value: 0.8928571428571428
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.91133005 1. 0.9375 0.91071429 0.8125
0.96428571 0.96428571 0.76785714 1. ]
mean value: 0.9268472906403941
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.75 1. 0.875 0.61538462 0.625
0.8 0.8 0.5 1. ]
mean value: 0.7965384615384615
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.91
Accuracy on Blind test: 0.97
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02580166 0.02822471 0.02490973 0.04556131 0.0245204 0.02402449
0.02442002 0.02469778 0.02477241 0.02475023]
mean value: 0.02716827392578125
key: score_time
value: [0.01345968 0.01275158 0.01257443 0.01375508 0.01480269 0.01374626
0.01462483 0.01372385 0.01461887 0.0150516 ]
mean value: 0.013910889625549316
key: test_mcc
value: [-0.08304548 -0.11915865 -0.16116459 -0.23904572 -0.12964074 -0.3086067
0.32232919 -0.12964074 -0.15309311 0.15161961]
mean value: -0.08494469446571944
key: train_mcc
value: [0.24048671 0.24048671 0.28810855 0.21675985 0.28810855 0.2663143
0.18742507 0.24272682 0.24058235 0.15144495]
mean value: 0.23624438625695043
key: test_accuracy
value: [0.77777778 0.75 0.69444444 0.61111111 0.72222222 0.52777778
0.80555556 0.72222222 0.71428571 0.74285714]
mean value: 0.7068253968253968
key: train_accuracy
value: [0.80124224 0.80124224 0.81055901 0.80124224 0.81055901 0.80745342
0.79813665 0.80434783 0.80185759 0.79256966]
mean value: 0.8029209853277696
key: test_fscore
value: [0. 0. 0. 0. 0. 0.
0.36363636 0. 0. 0.30769231]
mean value: 0.06713286713286713
key: train_fscore
value: [0.13513514 0.13513514 0.18666667 0.11111111 0.18666667 0.16216216
0.08450704 0.1369863 0.13513514 0.05633803]
mean value: 0.13298433838044102
key: test_precision
value: [0. 0. 0. 0. 0. 0.
0.66666667 0. 0. 0.33333333]
mean value: 0.09999999999999999
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0. 0. 0. 0. 0. 0.
0.25 0. 0. 0.28571429]
mean value: 0.05357142857142857
key: train_recall
value: [0.07246377 0.07246377 0.10294118 0.05882353 0.10294118 0.08823529
0.04411765 0.07352941 0.07246377 0.02898551]
mean value: 0.07169650468883206
key: test_roc_auc
value: [0.48275862 0.46551724 0.44642857 0.39285714 0.46428571 0.33928571
0.60714286 0.46428571 0.44642857 0.57142857]
mean value: 0.4680418719211823
key: train_roc_auc
value: [0.53623188 0.53623188 0.55147059 0.52941176 0.55147059 0.54411765
0.52205882 0.53676471 0.53623188 0.51449275]
mean value: 0.535848252344416
key: test_jcc
value: [0. 0. 0. 0. 0. 0.
0.22222222 0. 0. 0.18181818]
mean value: 0.0404040404040404
key: train_jcc
value: [0.07246377 0.07246377 0.10294118 0.05882353 0.10294118 0.08823529
0.04411765 0.07352941 0.07246377 0.02898551]
mean value: 0.07169650468883206
MCC on Blind test: 0.02
Accuracy on Blind test: 0.77
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02734208 0.03658342 0.03627729 0.03802681 0.03607988 0.03623199
0.07836962 0.0335741 0.04661655 0.03537488]
mean value: 0.040447664260864255
key: score_time
value: [0.02119327 0.02037835 0.02254105 0.02403259 0.02666831 0.02385139
0.02655315 0.0224328 0.02348709 0.02180433]
mean value: 0.023294234275817872
key: test_mcc
value: [1. 0.6144869 0.55814043 0.75134288 0.46291005 0.66143783
0.67857143 0.77151675 0.71842121 0.7484552 ]
mean value: 0.6965282676681155
key: train_mcc
value: [0.89727565 0.88757529 0.87718604 0.86725712 0.88588911 0.88708251
0.90539133 0.91509932 0.88839586 0.89735962]
mean value: 0.89085118703998
key: test_accuracy
value: [1. 0.88888889 0.86111111 0.91666667 0.80555556 0.88888889
0.88888889 0.91666667 0.91428571 0.91428571]
mean value: 0.8995238095238095
key: train_accuracy
value: [0.96583851 0.96273292 0.95962733 0.95652174 0.96273292 0.96273292
0.9689441 0.97204969 0.9628483 0.96594427]
mean value: 0.9639972693883045
key: test_fscore
value: [1. 0.66666667 0.61538462 0.76923077 0.58823529 0.66666667
0.75 0.82352941 0.72727273 0.8 ]
mean value: 0.7406986151103798
key: train_fscore
value: [0.91851852 0.91044776 0.90225564 0.89393939 0.90769231 0.91044776
0.92424242 0.93233083 0.91176471 0.91851852]
mean value: 0.9130157857346989
key: test_precision
value: [1. 0.8 0.8 1. 0.55555556 1.
0.75 0.77777778 1. 0.75 ]
mean value: 0.8433333333333334
key: train_precision
value: [0.93939394 0.93846154 0.92307692 0.921875 0.9516129 0.92424242
0.953125 0.95384615 0.92537313 0.93939394]
mean value: 0.9370400955969084
key: test_recall
value: [1. 0.57142857 0.5 0.625 0.625 0.5
0.75 0.875 0.57142857 0.85714286]
mean value: 0.6875
key: train_recall
value: [0.89855072 0.88405797 0.88235294 0.86764706 0.86764706 0.89705882
0.89705882 0.91176471 0.89855072 0.89855072]
mean value: 0.8903239556692242
key: test_roc_auc
value: [1. 0.76847291 0.73214286 0.8125 0.74107143 0.75
0.83928571 0.90178571 0.78571429 0.89285714]
mean value: 0.8223830049261084
key: train_roc_auc
value: [0.94137022 0.93412385 0.93133395 0.92398101 0.92791802 0.93868689
0.9426239 0.94997684 0.93943284 0.94140135]
mean value: 0.9370848871745019
key: test_jcc
value: [1. 0.5 0.44444444 0.625 0.41666667 0.5
0.6 0.7 0.57142857 0.66666667]
mean value: 0.6024206349206349
key: train_jcc
value: [0.84931507 0.83561644 0.82191781 0.80821918 0.83098592 0.83561644
0.85915493 0.87323944 0.83783784 0.84931507]
mean value: 0.8401218119527979
MCC on Blind test: 0.84
Accuracy on Blind test: 0.94
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.24777889 0.24893379 0.25687099 0.27457643 0.29407358 0.27596569
0.33060312 0.30342221 0.28261089 0.29954863]
mean value: 0.281438422203064
key: score_time
value: [0.025563 0.02152777 0.02327323 0.02313852 0.02621269 0.02568793
0.02634549 0.02385783 0.03024006 0.02442074]
mean value: 0.025026726722717284
key: test_mcc
value: [1. 0.6144869 0.55814043 0.75032247 0.46291005 0.66143783
0.67857143 0.77151675 0.71842121 0.7484552 ]
mean value: 0.6964262266368373
key: train_mcc
value: [0.9358192 0.88757529 0.87718604 0.93513953 0.88588911 0.88708251
0.94407133 0.94407133 0.88839586 0.89735962]
mean value: 0.9082589838760209
key: test_accuracy
value: [1. 0.88888889 0.86111111 0.91666667 0.80555556 0.88888889
0.88888889 0.91666667 0.91428571 0.91428571]
mean value: 0.8995238095238095
key: train_accuracy
value: [0.97826087 0.96273292 0.95962733 0.97826087 0.96273292 0.96273292
0.98136646 0.98136646 0.9628483 0.96594427]
mean value: 0.9695873315001058
key: test_fscore
value: [1. 0.66666667 0.61538462 0.8 0.58823529 0.66666667
0.75 0.82352941 0.72727273 0.8 ]
mean value: 0.7437755381873029
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:107: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:110: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.94964029 0.91044776 0.90225564 0.94890511 0.90769231 0.91044776
0.95588235 0.95588235 0.91176471 0.91851852]
mean value: 0.9271436796720172
key: test_precision
value: [1. 0.8 0.8 0.85714286 0.55555556 1.
0.75 0.77777778 1. 0.75 ]
mean value: 0.829047619047619
key: train_precision
value: [0.94285714 0.93846154 0.92307692 0.94202899 0.9516129 0.92424242
0.95588235 0.95588235 0.92537313 0.93939394]
mean value: 0.9398811696975732
key: test_recall
value: [1. 0.57142857 0.5 0.75 0.625 0.5
0.75 0.875 0.57142857 0.85714286]
mean value: 0.7
key: train_recall
value: [0.95652174 0.88405797 0.88235294 0.95588235 0.86764706 0.89705882
0.95588235 0.95588235 0.89855072 0.89855072]
mean value: 0.9152387041773231
key: test_roc_auc
value: [1. 0.76847291 0.73214286 0.85714286 0.74107143 0.75
0.83928571 0.90178571 0.78571429 0.89285714]
mean value: 0.8268472906403941
key: train_roc_auc
value: [0.97035573 0.93412385 0.93133395 0.97006716 0.92791802 0.93868689
0.97203566 0.97203566 0.93943284 0.94140135]
mean value: 0.9497391118222522
key: test_jcc
value: [1. 0.5 0.44444444 0.66666667 0.41666667 0.5
0.6 0.7 0.57142857 0.66666667]
mean value: 0.6065873015873016
key: train_jcc
value: [0.90410959 0.83561644 0.82191781 0.90277778 0.83098592 0.83561644
0.91549296 0.91549296 0.83783784 0.84931507]
mean value: 0.8649162789067284
MCC on Blind test: 0.76
Accuracy on Blind test: 0.92
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03773212 0.03759193 0.03757524 0.03828263 0.0400517 0.03780508
0.03701949 0.04745054 0.04011869 0.03937149]
mean value: 0.03929989337921143
key: score_time
value: [0.01247263 0.01345062 0.01343799 0.01589584 0.01312828 0.01231027
0.01372886 0.01359797 0.01646304 0.01724863]
mean value: 0.014173412322998047
key: test_mcc
value: [0.82942474 0.75492611 0.75492611 0.92980296 0.75434227 0.89802651
0.75434227 0.82618439 0.85933785 0.93094934]
mean value: 0.8292262532717305
key: train_mcc
value: [0.9094503 0.921366 0.90138807 0.89754406 0.9212884 0.92916266
0.91738682 0.90945587 0.91732994 0.90562412]
mean value: 0.9129996244909306
key: test_accuracy
value: [0.9122807 0.87719298 0.87719298 0.96491228 0.875 0.94642857
0.875 0.91071429 0.92857143 0.96428571]
mean value: 0.9131578947368421
key: train_accuracy
value: [0.95463511 0.96055227 0.95069034 0.94871795 0.96062992 0.96456693
0.95866142 0.95472441 0.95866142 0.95275591]
mean value: 0.9564595660749506
key: test_fscore
value: [0.91525424 0.87719298 0.87719298 0.96551724 0.88135593 0.94339623
0.88135593 0.91525424 0.92592593 0.96551724]
mean value: 0.9147962938994972
key: train_fscore
value: [0.95427435 0.96015936 0.95069034 0.94820717 0.96047431 0.96442688
0.95841584 0.95463511 0.95857988 0.95238095]
mean value: 0.956224419292093
key: test_precision
value: [0.87096774 0.86206897 0.89285714 0.96551724 0.83870968 1.
0.83870968 0.87096774 0.96153846 0.93333333]
mean value: 0.9034669983335167
key: train_precision
value: [0.96385542 0.97177419 0.9488189 0.95582329 0.96428571 0.96825397
0.96414343 0.95652174 0.96047431 0.96 ]
mean value: 0.9613950962310953
key: test_recall
value: [0.96428571 0.89285714 0.86206897 0.96551724 0.92857143 0.89285714
0.92857143 0.96428571 0.89285714 1. ]
mean value: 0.9291871921182266
key: train_recall
value: [0.94488189 0.9488189 0.95256917 0.94071146 0.95669291 0.96062992
0.95275591 0.95275591 0.95669291 0.94488189]
mean value: 0.951139086863154
key: test_roc_auc
value: [0.91317734 0.87746305 0.87746305 0.96490148 0.875 0.94642857
0.875 0.91071429 0.92857143 0.96428571]
mean value: 0.9133004926108375
key: train_roc_auc
value: [0.95465438 0.96057546 0.95069403 0.94870219 0.96062992 0.96456693
0.95866142 0.95472441 0.95866142 0.95275591]
mean value: 0.9564626062058448
key: test_jcc
value: [0.84375 0.78125 0.78125 0.93333333 0.78787879 0.89285714
0.78787879 0.84375 0.86206897 0.93333333]
mean value: 0.8447350350798627
key: train_jcc
value: [0.91254753 0.92337165 0.90601504 0.90151515 0.92395437 0.93129771
0.92015209 0.91320755 0.92045455 0.90909091]
mean value: 0.9161606540653082
MCC on Blind test: 0.72
Accuracy on Blind test: 0.9
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.94869375 0.87011933 1.19105005 0.99319768 0.97389174 0.92805409
0.93025017 0.93565726 0.83603239 1.05092835]
mean value: 0.9657874822616577
key: score_time
value: [0.01407504 0.01457548 0.01895332 0.02009797 0.01391673 0.01355195
0.01422977 0.01349163 0.01395845 0.01340699]
mean value: 0.015025734901428223
key: test_mcc
value: [0.82942474 0.8951918 0.86189955 0.96547546 0.85933785 0.93094934
0.93094934 0.93094934 0.89342711 0.93094934]
mean value: 0.9028553849533496
key: train_mcc
value: [0.98817342 0.98028353 0.98817323 0.98817323 0.99607071 0.99212598
0.98819663 1. 0.99212598 0.98032256]
mean value: 0.9893645282093079
key: test_accuracy
value: [0.9122807 0.94736842 0.92982456 0.98245614 0.92857143 0.96428571
0.96428571 0.96428571 0.94642857 0.96428571]
mean value: 0.9504072681704261
key: train_accuracy
value: [0.99408284 0.99013807 0.99408284 0.99408284 0.9980315 0.99606299
0.99409449 1. 0.99606299 0.99015748]
mean value: 0.9946796036590101
key: test_fscore
value: [0.91525424 0.94545455 0.92857143 0.98305085 0.93103448 0.96296296
0.96551724 0.96551724 0.94736842 0.96551724]
mean value: 0.9510248649683883
key: train_fscore
value: [0.99408284 0.99017682 0.99405941 0.99405941 0.99802761 0.99606299
0.99408284 1. 0.99606299 0.99017682]
mean value: 0.9946791724596361
key: test_precision
value: [0.87096774 0.96296296 0.96296296 0.96666667 0.9 1.
0.93333333 0.93333333 0.93103448 0.93333333]
mean value: 0.9394594817286697
key: train_precision
value: [0.99604743 0.98823529 0.99603175 0.99603175 1. 0.99606299
0.99604743 1. 0.99606299 0.98823529]
mean value: 0.9952754926210834
key: test_recall
value: [0.96428571 0.92857143 0.89655172 1. 0.96428571 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.9646551724137931
key: train_recall
value: [0.99212598 0.99212598 0.99209486 0.99209486 0.99606299 0.99606299
0.99212598 1. 0.99606299 0.99212598]
mean value: 0.9940882636705985
key: test_roc_auc
value: [0.91317734 0.94704433 0.93041872 0.98214286 0.92857143 0.96428571
0.96428571 0.96428571 0.94642857 0.96428571]
mean value: 0.9504926108374385
key: train_roc_auc
value: [0.99408671 0.99013414 0.99407893 0.99407893 0.9980315 0.99606299
0.99409449 1. 0.99606299 0.99015748]
mean value: 0.9946788148517008
key: test_jcc
value: [0.84375 0.89655172 0.86666667 0.96666667 0.87096774 0.92857143
0.93333333 0.93333333 0.9 0.93333333]
mean value: 0.9073174227978177
key: train_jcc
value: [0.98823529 0.98054475 0.98818898 0.98818898 0.99606299 0.99215686
0.98823529 1. 0.99215686 0.98054475]
mean value: 0.9894314752770804
MCC on Blind test: 0.81
Accuracy on Blind test: 0.93
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01455855 0.01241016 0.01061916 0.01032066 0.01021433 0.00997758
0.01221704 0.01006413 0.01007271 0.01009989]
mean value: 0.011055421829223634
key: score_time
value: [0.01239634 0.00947404 0.00947428 0.00910234 0.00912118 0.00902677
0.00919771 0.00907707 0.00964546 0.00893784]
mean value: 0.009545302391052246
key: test_mcc
value: [0.47713554 0.553659 0.75462449 0.54592083 0.60753044 0.64285714
0.5118907 0.67082039 0.39310793 0.5118907 ]
mean value: 0.5669437167779643
key: train_mcc
value: [0.61830137 0.59162207 0.59295071 0.61416745 0.67031032 0.64585416
0.65661014 0.65225378 0.62955117 0.6032316 ]
mean value: 0.6274852770592964
key: test_accuracy
value: [0.73684211 0.77192982 0.87719298 0.77192982 0.80357143 0.82142857
0.75 0.82142857 0.69642857 0.75 ]
mean value: 0.7800751879699248
key: train_accuracy
value: [0.80473373 0.78303748 0.79487179 0.80473373 0.83464567 0.81889764
0.82480315 0.82283465 0.81299213 0.7992126 ]
mean value: 0.8100762552609918
key: test_fscore
value: [0.74576271 0.78688525 0.88135593 0.78688525 0.8 0.82142857
0.77419355 0.84375 0.70175439 0.77419355]
mean value: 0.7916209190038752
key: train_fscore
value: [0.82032668 0.81099656 0.80451128 0.81564246 0.82995951 0.83211679
0.83669725 0.83455882 0.82242991 0.81111111]
mean value: 0.821835037001602
key: test_precision
value: [0.70967742 0.72727273 0.86666667 0.75 0.81481481 0.82142857
0.70588235 0.75 0.68965517 0.70588235]
mean value: 0.7541280077833765
key: train_precision
value: [0.76094276 0.7195122 0.76702509 0.77112676 0.85416667 0.7755102
0.78350515 0.78275862 0.78291815 0.76573427]
mean value: 0.7763199867511414
key: test_recall
value: [0.78571429 0.85714286 0.89655172 0.82758621 0.78571429 0.82142857
0.85714286 0.96428571 0.71428571 0.85714286]
mean value: 0.8366995073891625
key: train_recall
value: [0.88976378 0.92913386 0.8458498 0.86561265 0.80708661 0.8976378
0.8976378 0.89370079 0.86614173 0.86220472]
mean value: 0.8754769537207059
key: test_roc_auc
value: [0.73768473 0.77339901 0.87684729 0.77093596 0.80357143 0.82142857
0.75 0.82142857 0.69642857 0.75 ]
mean value: 0.7801724137931034
key: train_roc_auc
value: [0.80456568 0.78274875 0.79497215 0.80485357 0.83464567 0.81889764
0.82480315 0.82283465 0.81299213 0.7992126 ]
mean value: 0.8100525971802932
key: test_jcc
value: [0.59459459 0.64864865 0.78787879 0.64864865 0.66666667 0.6969697
0.63157895 0.72972973 0.54054054 0.63157895]
mean value: 0.6576835208414156
key: train_jcc
value: [0.69538462 0.68208092 0.67295597 0.68867925 0.70934256 0.7125
0.7192429 0.71608833 0.6984127 0.68224299]
mean value: 0.6976930240270341
MCC on Blind test: 0.2
Accuracy on Blind test: 0.67
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01165748 0.01036167 0.0107286 0.0104363 0.01048374 0.01035953
0.0104208 0.01058149 0.01036954 0.01047564]
mean value: 0.010587477684020996
key: score_time
value: [0.00932479 0.00897956 0.00896621 0.00899768 0.00909901 0.00911689
0.00907898 0.00910473 0.00895834 0.00902915]
mean value: 0.009065532684326172
key: test_mcc
value: [0.58562417 0.62473685 0.50927421 0.57973205 0.60753044 0.71428571
0.64951905 0.72168784 0.67900461 0.39310793]
mean value: 0.6064502839116733
key: train_mcc
value: [0.63864108 0.67343572 0.67495523 0.65362362 0.63188315 0.6387663
0.65228602 0.64665231 0.67097829 0.6472967 ]
mean value: 0.6528518419173023
key: test_accuracy
value: [0.78947368 0.80701754 0.75438596 0.78947368 0.80357143 0.85714286
0.82142857 0.85714286 0.83928571 0.69642857]
mean value: 0.8015350877192983
key: train_accuracy
value: [0.81854043 0.83629191 0.83629191 0.82642998 0.81496063 0.81889764
0.82480315 0.82283465 0.83464567 0.82283465]
mean value: 0.8256530618583919
key: test_fscore
value: [0.8 0.81967213 0.76666667 0.8 0.80701754 0.85714286
0.83333333 0.86666667 0.83636364 0.69090909]
mean value: 0.8077771926089441
key: train_fscore
value: [0.82509506 0.84069098 0.84250474 0.83011583 0.8219697 0.82375479
0.83239171 0.82758621 0.84030418 0.82889734]
mean value: 0.8313310537668297
key: test_precision
value: [0.75 0.75757576 0.74193548 0.77419355 0.79310345 0.85714286
0.78125 0.8125 0.85185185 0.7037037 ]
mean value: 0.7823256650808097
key: train_precision
value: [0.79779412 0.82022472 0.81021898 0.81132075 0.7919708 0.80223881
0.79783394 0.80597015 0.8125 0.80147059]
mean value: 0.8051542850964287
key: test_recall
value: [0.85714286 0.89285714 0.79310345 0.82758621 0.82142857 0.85714286
0.89285714 0.92857143 0.82142857 0.67857143]
mean value: 0.8370689655172414
key: train_recall
value: [0.85433071 0.86220472 0.87747036 0.84980237 0.85433071 0.84645669
0.87007874 0.8503937 0.87007874 0.85826772]
mean value: 0.8593414459556192
key: test_roc_auc
value: [0.79064039 0.80849754 0.75369458 0.7887931 0.80357143 0.85714286
0.82142857 0.85714286 0.83928571 0.69642857]
mean value: 0.8016625615763546
key: train_roc_auc
value: [0.8184697 0.8362407 0.83637297 0.82647599 0.81496063 0.81889764
0.82480315 0.82283465 0.83464567 0.82283465]
mean value: 0.8256535744296786
key: test_jcc
value: [0.66666667 0.69444444 0.62162162 0.66666667 0.67647059 0.75
0.71428571 0.76470588 0.71875 0.52777778]
mean value: 0.6801389362051127
key: train_jcc
value: [0.70226537 0.72516556 0.72786885 0.70957096 0.6977492 0.70032573
0.71290323 0.70588235 0.72459016 0.70779221]
mean value: 0.7114113624151682
MCC on Blind test: 0.31
Accuracy on Blind test: 0.73
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00955963 0.01076388 0.0108633 0.01068926 0.01065207 0.01071215
0.00975442 0.01087141 0.01065636 0.01067686]
mean value: 0.010519933700561524
key: score_time
value: [0.01791644 0.01392055 0.01368237 0.01361632 0.01391411 0.01670647
0.01601553 0.01230955 0.01352668 0.01577139]
mean value: 0.014737939834594727
key: test_mcc
value: [0.62473685 0.82490815 0.47519927 0.65018988 0.72168784 0.72168784
0.65814518 0.8660254 0.71611487 0.68250015]
mean value: 0.6941195430259357
key: train_mcc
value: [0.82324487 0.81065015 0.84223222 0.79510329 0.80709287 0.80337378
0.79936749 0.79163927 0.81142619 0.83148876]
mean value: 0.8115618890001493
key: test_accuracy
value: [0.80701754 0.9122807 0.73684211 0.8245614 0.85714286 0.85714286
0.82142857 0.92857143 0.85714286 0.83928571]
mean value: 0.844141604010025
key: train_accuracy
value: [0.9112426 0.90532544 0.92110454 0.8974359 0.90354331 0.9015748
0.8996063 0.89566929 0.90551181 0.91535433]
mean value: 0.9056368323782013
key: test_fscore
value: [0.81967213 0.90909091 0.75409836 0.83333333 0.86666667 0.86666667
0.83870968 0.93333333 0.85185185 0.83018868]
mean value: 0.8503611609410677
key: train_fscore
value: [0.9132948 0.90551181 0.92063492 0.8984375 0.90335306 0.90272374
0.9005848 0.89708738 0.90697674 0.91714836]
mean value: 0.9065753102337704
key: test_precision
value: [0.75757576 0.92592593 0.71875 0.80645161 0.8125 0.8125
0.76470588 0.875 0.88461538 0.88 ]
mean value: 0.8238024563373235
key: train_precision
value: [0.89433962 0.90551181 0.92430279 0.88803089 0.90513834 0.89230769
0.89189189 0.88505747 0.89312977 0.89811321]
mean value: 0.8977823484465078
key: test_recall
value: [0.89285714 0.89285714 0.79310345 0.86206897 0.92857143 0.92857143
0.92857143 1. 0.82142857 0.78571429]
mean value: 0.8833743842364532
key: train_recall
value: [0.93307087 0.90551181 0.91699605 0.90909091 0.9015748 0.91338583
0.90944882 0.90944882 0.92125984 0.93700787]
mean value: 0.9156795617939062
key: test_roc_auc
value: [0.80849754 0.91194581 0.73583744 0.82389163 0.85714286 0.85714286
0.82142857 0.92857143 0.85714286 0.83928571]
mean value: 0.844088669950739
key: train_roc_auc
value: [0.91119946 0.90532508 0.92109645 0.89745884 0.90354331 0.9015748
0.8996063 0.89566929 0.90551181 0.91535433]
mean value: 0.9056339671967881
key: test_jcc
value: [0.69444444 0.83333333 0.60526316 0.71428571 0.76470588 0.76470588
0.72222222 0.875 0.74193548 0.70967742]
mean value: 0.742557354011214
key: train_jcc
value: [0.84042553 0.82733813 0.85294118 0.81560284 0.82374101 0.82269504
0.81914894 0.81338028 0.82978723 0.84697509]
mean value: 0.8292035258287433
MCC on Blind test: 0.4
Accuracy on Blind test: 0.8
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.021698 0.0257144 0.02186465 0.02199745 0.02193356 0.02225208
0.02210999 0.02229166 0.02531195 0.02225661]
mean value: 0.022743034362792968
key: score_time
value: [0.01183033 0.01237917 0.01338649 0.01215816 0.01205087 0.01202941
0.01245189 0.01203823 0.01343703 0.01191998]
mean value: 0.012368154525756837
key: test_mcc
value: [0.79778885 0.68472906 0.68736396 0.9321832 0.75047877 0.82195294
0.73127242 0.73127242 0.75434227 0.85933785]
mean value: 0.7750721760268345
key: train_mcc
value: [0.83878121 0.84285233 0.85486038 0.8349816 0.8355787 0.84662074
0.83123063 0.84662074 0.83630655 0.8543903 ]
mean value: 0.8422223192370651
key: test_accuracy
value: [0.89473684 0.84210526 0.84210526 0.96491228 0.875 0.91071429
0.85714286 0.85714286 0.875 0.92857143]
mean value: 0.8847431077694236
key: train_accuracy
value: [0.91913215 0.92110454 0.9270217 0.91715976 0.91732283 0.92322835
0.91535433 0.92322835 0.91732283 0.92716535]
mean value: 0.9208040193200702
key: test_fscore
value: [0.9 0.84210526 0.85245902 0.96428571 0.87719298 0.90909091
0.87096774 0.87096774 0.86792453 0.93103448]
mean value: 0.8886028380315576
key: train_fscore
value: [0.92069632 0.92277992 0.92843327 0.91860465 0.91923077 0.92397661
0.91682785 0.92397661 0.91984733 0.92759295]
mean value: 0.9221966289590753
key: test_precision
value: [0.84375 0.82758621 0.8125 1. 0.86206897 0.92592593
0.79411765 0.79411765 0.92 0.9 ]
mean value: 0.8680066392457366
key: train_precision
value: [0.90494297 0.90530303 0.90909091 0.90114068 0.89849624 0.91505792
0.90114068 0.91505792 0.89259259 0.92217899]
mean value: 0.9065001925631475
key: test_recall
value: [0.96428571 0.85714286 0.89655172 0.93103448 0.89285714 0.89285714
0.96428571 0.96428571 0.82142857 0.96428571]
mean value: 0.9149014778325123
key: train_recall
value: [0.93700787 0.94094488 0.9486166 0.93675889 0.94094488 0.93307087
0.93307087 0.93307087 0.9488189 0.93307087]
mean value: 0.9385375494071146
key: test_roc_auc
value: [0.89593596 0.84236453 0.841133 0.96551724 0.875 0.91071429
0.85714286 0.85714286 0.875 0.92857143]
mean value: 0.8848522167487685
key: train_roc_auc
value: [0.91909682 0.92106533 0.92706421 0.91719834 0.91732283 0.92322835
0.91535433 0.92322835 0.91732283 0.92716535]
mean value: 0.9208046746133018
key: test_jcc
value: [0.81818182 0.72727273 0.74285714 0.93103448 0.78125 0.83333333
0.77142857 0.77142857 0.76666667 0.87096774]
mean value: 0.8014421055862936
key: train_jcc
value: [0.85304659 0.85663082 0.86642599 0.84946237 0.85053381 0.85869565
0.84642857 0.85869565 0.85159011 0.8649635 ]
mean value: 0.8556473070988301
MCC on Blind test: 0.68
Accuracy on Blind test: 0.89
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.07555914 2.03910518 1.99554539 1.15945792 2.01181102 2.08725309
2.18683028 2.63542819 2.05218887 2.24368691]
mean value: 2.0486865997314454
key: score_time
value: [0.01258802 0.0125947 0.01385498 0.01252556 0.02199531 0.01386309
0.0126183 0.05426383 0.01399326 0.01492596]
mean value: 0.018322300910949708
key: test_mcc
value: [0.76689254 0.82490815 0.85960591 1. 0.78772636 0.89342711
0.8660254 0.89802651 0.8660254 0.93094934]
mean value: 0.8693586722821179
key: train_mcc
value: [0.99606293 0.99606293 0.99606299 0.98425123 1. 0.99607071
1. 1. 0.99607071 0.99607071]
mean value: 0.9960652223608333
key: test_accuracy
value: [0.87719298 0.9122807 0.92982456 1. 0.89285714 0.94642857
0.92857143 0.94642857 0.92857143 0.96428571]
mean value: 0.9326441102756893
key: train_accuracy
value: [0.99802761 0.99802761 0.99802761 0.99211045 1. 0.9980315
1. 1. 0.9980315 0.9980315 ]
mean value: 0.9980287782074578
key: test_fscore
value: [0.8852459 0.90909091 0.93103448 1. 0.89655172 0.94545455
0.93333333 0.94915254 0.92307692 0.96551724]
mean value: 0.9338457603243798
key: train_fscore
value: [0.99803536 0.99803536 0.99802761 0.99206349 1. 0.99803536
1. 1. 0.99803536 0.99803536]
mean value: 0.9980267922764523
key: test_precision
value: [0.81818182 0.92592593 0.93103448 1. 0.86666667 0.96296296
0.875 0.90322581 1. 0.93333333]
mean value: 0.921633099628094
key: train_precision
value: [0.99607843 0.99607843 0.99606299 0.99601594 1. 0.99607843
1. 1. 0.99607843 0.99607843]
mean value: 0.9972471085243709
key: test_recall
value: [0.96428571 0.89285714 0.93103448 1. 0.92857143 0.92857143
1. 1. 0.85714286 1. ]
mean value: 0.9502463054187192
key: train_recall
value: [1. 1. 1. 0.98814229 1. 1.
1. 1. 1. 1. ]
mean value: 0.9988142292490119
key: test_roc_auc
value: [0.87869458 0.91194581 0.92980296 1. 0.89285714 0.94642857
0.92857143 0.94642857 0.92857143 0.96428571]
mean value: 0.9327586206896552
key: train_roc_auc
value: [0.99802372 0.99802372 0.9980315 0.99210264 1. 0.9980315
1. 1. 0.9980315 0.9980315 ]
mean value: 0.998027605739006
key: test_jcc
value: [0.79411765 0.83333333 0.87096774 1. 0.8125 0.89655172
0.875 0.90322581 0.85714286 0.93333333]
mean value: 0.8776172443393375
key: train_jcc
value: [0.99607843 0.99607843 0.99606299 0.98425197 1. 0.99607843
1. 1. 0.99607843 0.99607843]
mean value: 0.9960707117492666
MCC on Blind test: 0.77
Accuracy on Blind test: 0.92
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.06019211 0.02817774 0.03345776 0.03131986 0.03362656 0.03552794
0.03400302 0.03139901 0.03243303 0.03354049]
mean value: 0.035367751121521
key: score_time
value: [0.01245236 0.00913262 0.00972891 0.00909996 0.01145601 0.01146054
0.00974631 0.01045251 0.00957561 0.00912237]
mean value: 0.010222721099853515
key: test_mcc
value: [0.93202124 0.79110556 0.85960591 0.92980296 0.70082556 1.
0.85933785 0.89802651 0.93094934 0.93094934]
mean value: 0.8832624260804833
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.89473684 0.92982456 0.96491228 0.83928571 1.
0.92857143 0.94642857 0.96428571 0.96428571]
mean value: 0.9397243107769424
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96296296 0.88888889 0.93103448 0.96551724 0.85714286 1.
0.93103448 0.94915254 0.96551724 0.96296296]
mean value: 0.9414213662606415
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.92307692 0.93103448 0.96551724 0.77142857 1.
0.9 0.90322581 0.93333333 1. ]
mean value: 0.9327616358428372
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92857143 0.85714286 0.93103448 0.96551724 0.96428571 1.
0.96428571 1. 1. 0.92857143]
mean value: 0.9539408866995074
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96428571 0.89408867 0.92980296 0.96490148 0.83928571 1.
0.92857143 0.94642857 0.96428571 0.96428571]
mean value: 0.9395935960591133
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92857143 0.8 0.87096774 0.93333333 0.75 1.
0.87096774 0.90322581 0.93333333 0.92857143]
mean value: 0.8918970814132104
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.96
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.12719536 0.13101172 0.12918282 0.12430263 0.12717915 0.12785411
0.12964702 0.13043404 0.13283753 0.13035512]
mean value: 0.12899994850158691
key: score_time
value: [0.01965737 0.01963568 0.01971865 0.0194478 0.01989031 0.01939631
0.01996708 0.02017188 0.0194571 0.01910806]
mean value: 0.019645023345947265
key: test_mcc
value: [0.92980296 0.85960591 0.71921182 0.96551724 0.89342711 0.85714286
0.8660254 0.93094934 0.83484711 0.92857143]
mean value: 0.8785101179086021
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.92982456 0.85964912 0.98245614 0.94642857 0.92857143
0.92857143 0.96428571 0.91071429 0.96428571]
mean value: 0.9379699248120301
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96428571 0.92857143 0.86206897 0.98245614 0.94736842 0.92857143
0.93333333 0.96551724 0.90196078 0.96428571]
mean value: 0.9378419171661405
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96428571 0.92857143 0.86206897 1. 0.93103448 0.92857143
0.875 0.93333333 1. 0.96428571]
mean value: 0.9387151067323481
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.92857143 0.86206897 0.96551724 0.96428571 0.92857143
1. 1. 0.82142857 0.96428571]
mean value: 0.9399014778325123
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96490148 0.92980296 0.85960591 0.98275862 0.94642857 0.92857143
0.92857143 0.96428571 0.91071429 0.96428571]
mean value: 0.9379926108374385
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93103448 0.86666667 0.75757576 0.96551724 0.9 0.86666667
0.875 0.93333333 0.82142857 0.93103448]
mean value: 0.8848257202567548
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.69
Accuracy on Blind test: 0.9
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01180434 0.01166081 0.01208377 0.01183748 0.011832 0.01188374
0.01314521 0.0108161 0.01178122 0.01097274]
mean value: 0.011781740188598632
key: score_time
value: [0.00960636 0.00973678 0.0089736 0.00985813 0.00992084 0.00999618
0.00909567 0.00982451 0.0099051 0.00940228]
mean value: 0.009631943702697755
key: test_mcc
value: [0.50927421 0.54377353 0.59060008 0.7257422 0.5728919 0.42857143
0.75434227 0.64450339 0.39513166 0.57142857]
mean value: 0.5736259244504346
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75438596 0.77192982 0.78947368 0.85964912 0.78571429 0.71428571
0.875 0.82142857 0.69642857 0.78571429]
mean value: 0.7854010025062657
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.74074074 0.76363636 0.8125 0.87096774 0.77777778 0.71428571
0.88135593 0.82758621 0.71186441 0.78571429]
mean value: 0.788642916996997
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.76923077 0.77777778 0.74285714 0.81818182 0.80769231 0.71428571
0.83870968 0.8 0.67741935 0.78571429]
mean value: 0.773186884799788
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.71428571 0.75 0.89655172 0.93103448 0.75 0.71428571
0.92857143 0.85714286 0.75 0.78571429]
mean value: 0.8077586206896552
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75369458 0.77155172 0.78756158 0.85837438 0.78571429 0.71428571
0.875 0.82142857 0.69642857 0.78571429]
mean value: 0.7849753694581281
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.58823529 0.61764706 0.68421053 0.77142857 0.63636364 0.55555556
0.78787879 0.70588235 0.55263158 0.64705882]
mean value: 0.6546892185901474
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.21
Accuracy on Blind test: 0.72
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.09243822 1.99595952 2.03226089 1.98863244 2.03522563 2.05493855
2.07914615 1.96964455 2.04330802 2.02386498]
mean value: 2.031541895866394
key: score_time
value: [0.10404348 0.10880017 0.10026383 0.09849286 0.1004591 0.10068846
0.10119557 0.10098815 0.10136032 0.09345913]
mean value: 0.10097510814666748
key: test_mcc
value: [1. 0.8951918 0.89988258 1. 0.89342711 0.96490128
0.93094934 0.93094934 0.92857143 1. ]
mean value: 0.9443872875319015
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.94736842 0.94736842 1. 0.94642857 0.98214286
0.96428571 0.96428571 0.96428571 1. ]
mean value: 0.9716165413533835
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94545455 0.94545455 1. 0.94736842 0.98181818
0.96551724 0.96551724 0.96428571 1. ]
mean value: 0.9715415890824239
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96296296 1. 1. 0.93103448 1.
0.93333333 0.93333333 0.96428571 1. ]
mean value: 0.9724949826673964
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.92857143 0.89655172 1. 0.96428571 0.96428571
1. 1. 0.96428571 1. ]
mean value: 0.9717980295566503
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.94704433 0.94827586 1. 0.94642857 0.98214286
0.96428571 0.96428571 0.96428571 1. ]
mean value: 0.9716748768472907
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89655172 0.89655172 1. 0.9 0.96428571
0.93333333 0.93333333 0.93103448 1. ]
mean value: 0.9455090311986863
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.79
Accuracy on Blind test: 0.93
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.97834277 1.00785446 0.99240088 0.98267341 1.02293301 0.96102357
0.94727778 0.99126816 0.98970008 1.04643464]
mean value: 0.9919908761978149
key: score_time
value: [0.19152308 0.2427218 0.24201584 0.20301509 0.25864029 0.18198228
0.2727077 0.23134995 0.22634912 0.265769 ]
mean value: 0.23160741329193116
key: test_mcc
value: [1. 0.8951918 0.9321832 1. 0.93094934 0.96490128
0.93094934 0.93094934 0.92857143 1. ]
mean value: 0.9513695720675288
key: train_mcc
value: [0.9685613 0.97645211 0.97645357 0.97245522 0.98032256 0.96862405
0.97250878 0.98032256 0.97250878 0.96862405]
mean value: 0.9736832978955833
key: test_accuracy
value: [1. 0.94736842 0.96491228 1. 0.96428571 0.98214286
0.96428571 0.96428571 0.96428571 1. ]
mean value: 0.97515664160401
key: train_accuracy
value: [0.98422091 0.98816568 0.98816568 0.98619329 0.99015748 0.98425197
0.98622047 0.99015748 0.98622047 0.98425197]
mean value: 0.9868005404649863
key: test_fscore
value: [1. 0.94545455 0.96428571 1. 0.96551724 0.98181818
0.96551724 0.96551724 0.96428571 1. ]
mean value: 0.9752395879982088
key: train_fscore
value: [0.984375 0.98828125 0.98823529 0.98624754 0.99017682 0.984375
0.98630137 0.99017682 0.98630137 0.984375 ]
mean value: 0.98688454626256
key: test_precision
value: [1. 0.96296296 1. 1. 0.93333333 1.
0.93333333 0.93333333 0.96428571 1. ]
mean value: 0.9727248677248678
key: train_precision
value: [0.97674419 0.98062016 0.98054475 0.98046875 0.98823529 0.97674419
0.98054475 0.98823529 0.98054475 0.97674419]
mean value: 0.9809426292658725
key: test_recall
value: [1. 0.92857143 0.93103448 1. 1. 0.96428571
1. 1. 0.96428571 1. ]
mean value: 0.9788177339901478
key: train_recall
value: [0.99212598 0.99606299 0.99604743 0.99209486 0.99212598 0.99212598
0.99212598 0.99212598 0.99212598 0.99212598]
mean value: 0.9929087174379883
key: test_roc_auc
value: [1. 0.94704433 0.96551724 1. 0.96428571 0.98214286
0.96428571 0.96428571 0.96428571 1. ]
mean value: 0.9751847290640394
key: train_roc_auc
value: [0.98420528 0.98815007 0.9881812 0.98620491 0.99015748 0.98425197
0.98622047 0.99015748 0.98622047 0.98425197]
mean value: 0.9868001307148859
key: test_jcc
value: [1. 0.89655172 0.93103448 1. 0.93333333 0.96428571
0.93333333 0.93333333 0.93103448 1. ]
mean value: 0.9522906403940887
key: train_jcc
value: [0.96923077 0.97683398 0.97674419 0.97286822 0.98054475 0.96923077
0.97297297 0.98054475 0.97297297 0.96923077]
mean value: 0.9741174127736429
MCC on Blind test: 0.83
Accuracy on Blind test: 0.94
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0242126 0.01131368 0.01130939 0.01131654 0.01126266 0.01142788
0.01134253 0.01130843 0.01151514 0.01075053]
mean value: 0.012575936317443848
key: score_time
value: [0.00987244 0.00902605 0.00979638 0.0095818 0.00964236 0.00970507
0.00961256 0.00966287 0.00970745 0.00935006]
mean value: 0.009595704078674317
key: test_mcc
value: [0.58562417 0.62473685 0.50927421 0.57973205 0.60753044 0.71428571
0.64951905 0.72168784 0.67900461 0.39310793]
mean value: 0.6064502839116733
key: train_mcc
value: [0.63864108 0.67343572 0.67495523 0.65362362 0.63188315 0.6387663
0.65228602 0.64665231 0.67097829 0.6472967 ]
mean value: 0.6528518419173023
key: test_accuracy
value: [0.78947368 0.80701754 0.75438596 0.78947368 0.80357143 0.85714286
0.82142857 0.85714286 0.83928571 0.69642857]
mean value: 0.8015350877192983
key: train_accuracy
value: [0.81854043 0.83629191 0.83629191 0.82642998 0.81496063 0.81889764
0.82480315 0.82283465 0.83464567 0.82283465]
mean value: 0.8256530618583919
key: test_fscore
value: [0.8 0.81967213 0.76666667 0.8 0.80701754 0.85714286
0.83333333 0.86666667 0.83636364 0.69090909]
mean value: 0.8077771926089441
key: train_fscore
value: [0.82509506 0.84069098 0.84250474 0.83011583 0.8219697 0.82375479
0.83239171 0.82758621 0.84030418 0.82889734]
mean value: 0.8313310537668297
key: test_precision
value: [0.75 0.75757576 0.74193548 0.77419355 0.79310345 0.85714286
0.78125 0.8125 0.85185185 0.7037037 ]
mean value: 0.7823256650808097
key: train_precision
value: [0.79779412 0.82022472 0.81021898 0.81132075 0.7919708 0.80223881
0.79783394 0.80597015 0.8125 0.80147059]
mean value: 0.8051542850964287
key: test_recall
value: [0.85714286 0.89285714 0.79310345 0.82758621 0.82142857 0.85714286
0.89285714 0.92857143 0.82142857 0.67857143]
mean value: 0.8370689655172414
key: train_recall
value: [0.85433071 0.86220472 0.87747036 0.84980237 0.85433071 0.84645669
0.87007874 0.8503937 0.87007874 0.85826772]
mean value: 0.8593414459556192
key: test_roc_auc
value: [0.79064039 0.80849754 0.75369458 0.7887931 0.80357143 0.85714286
0.82142857 0.85714286 0.83928571 0.69642857]
mean value: 0.8016625615763546
key: train_roc_auc
value: [0.8184697 0.8362407 0.83637297 0.82647599 0.81496063 0.81889764
0.82480315 0.82283465 0.83464567 0.82283465]
mean value: 0.8256535744296786
key: test_jcc
value: [0.66666667 0.69444444 0.62162162 0.66666667 0.67647059 0.75
0.71428571 0.76470588 0.71875 0.52777778]
mean value: 0.6801389362051127
key: train_jcc
value: [0.70226537 0.72516556 0.72786885 0.70957096 0.6977492 0.70032573
0.71290323 0.70588235 0.72459016 0.70779221]
mean value: 0.7114113624151682
MCC on Blind test: 0.31
Accuracy on Blind test: 0.73
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.08987308 0.22393084 0.25971007 0.22161651 0.2970469 0.07107282
0.3908968 0.34707355 0.3672266 0.26015067]
mean value: 0.25285978317260743
key: score_time
value: [0.01140714 0.01223254 0.01125836 0.01227474 0.0113318 0.01112461
0.01194763 0.01308942 0.01288772 0.01307106]
mean value: 0.012062501907348634
key: test_mcc
value: [1. 0.82880708 0.96551724 0.96547546 0.89802651 1.
0.96490128 0.93094934 0.96490128 1. ]
mean value: 0.9518578190858389
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.9122807 0.98245614 0.98245614 0.94642857 1.
0.98214286 0.96428571 0.98214286 1. ]
mean value: 0.975219298245614
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.90566038 0.98245614 0.98305085 0.94915254 1.
0.98245614 0.96551724 0.98245614 1. ]
mean value: 0.9750749429620941
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96 1. 0.96666667 0.90322581 1.
0.96551724 0.93333333 0.96551724 1. ]
mean value: 0.9694260289210234
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.85714286 0.96551724 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9822660098522168
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.91133005 0.98275862 0.98214286 0.94642857 1.
0.98214286 0.96428571 0.98214286 1. ]
mean value: 0.9751231527093597
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.82758621 0.96551724 0.96666667 0.90322581 1.
0.96551724 0.93333333 0.96551724 1. ]
mean value: 0.9527363737486095
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.88
Accuracy on Blind test: 0.96
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.0668292 0.06841993 0.12328172 0.05418634 0.05178666 0.0750885
0.09460568 0.09279466 0.08649731 0.06908274]
mean value: 0.07825727462768554
key: score_time
value: [0.01650906 0.02038956 0.02707815 0.01239896 0.01989675 0.01733708
0.01960182 0.02041531 0.02011204 0.01233983]
mean value: 0.01860785484313965
key: test_mcc
value: [0.9321832 0.82512315 0.82490815 0.8951918 0.89342711 1.
0.85933785 0.82618439 1. 0.96490128]
mean value: 0.9021256935269077
key: train_mcc
value: [0.96055211 0.96844169 0.97239426 0.96844169 0.96850394 0.9645744
0.96850394 0.98425197 0.96850394 0.9645744 ]
mean value: 0.968874234820875
key: test_accuracy
value: [0.96491228 0.9122807 0.9122807 0.94736842 0.94642857 1.
0.92857143 0.91071429 1. 0.98214286]
mean value: 0.9504699248120301
key: train_accuracy
value: [0.98027613 0.98422091 0.98619329 0.98422091 0.98425197 0.98228346
0.98425197 0.99212598 0.98425197 0.98228346]
mean value: 0.9844360061501188
key: test_fscore
value: [0.96551724 0.9122807 0.91525424 0.94915254 0.94736842 1.
0.93103448 0.91525424 1. 0.98181818]
mean value: 0.9517680045712283
key: train_fscore
value: [0.98031496 0.98425197 0.98619329 0.98418972 0.98425197 0.98224852
0.98425197 0.99212598 0.98425197 0.98231827]
mean value: 0.98443986279333
key: test_precision
value: [0.93333333 0.89655172 0.9 0.93333333 0.93103448 1.
0.9 0.87096774 1. 1. ]
mean value: 0.9365220615498703
key: train_precision
value: [0.98031496 0.98425197 0.98425197 0.98418972 0.98425197 0.98418972
0.98425197 0.99212598 0.98425197 0.98039216]
mean value: 0.9842472390904636
key: test_recall
value: [1. 0.92857143 0.93103448 0.96551724 0.96428571 1.
0.96428571 0.96428571 1. 0.96428571]
mean value: 0.9682266009852217
key: train_recall
value: [0.98031496 0.98425197 0.98814229 0.98418972 0.98425197 0.98031496
0.98425197 0.99212598 0.98425197 0.98425197]
mean value: 0.9846347763841773
key: test_roc_auc
value: [0.96551724 0.91256158 0.91194581 0.94704433 0.94642857 1.
0.92857143 0.91071429 1. 0.98214286]
mean value: 0.9504926108374385
key: train_roc_auc
value: [0.98027606 0.98422085 0.98619713 0.98422085 0.98425197 0.98228346
0.98425197 0.99212598 0.98425197 0.98228346]
mean value: 0.984436369860882
key: test_jcc
value: [0.93333333 0.83870968 0.84375 0.90322581 0.9 1.
0.87096774 0.84375 1. 0.96428571]
mean value: 0.9098022273425499
key: train_jcc
value: [0.96138996 0.96899225 0.97276265 0.9688716 0.96899225 0.96511628
0.96899225 0.984375 0.96899225 0.96525097]
mean value: 0.9693735439203892
MCC on Blind test: 0.79
Accuracy on Blind test: 0.93
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02307963 0.01162243 0.01094699 0.01106811 0.01101923 0.01119828
0.01114917 0.01126027 0.01035953 0.01032233]
mean value: 0.012202596664428711
key: score_time
value: [0.01033926 0.01000857 0.00924134 0.00996804 0.01003981 0.01006603
0.00951672 0.00953388 0.00911236 0.00968027]
mean value: 0.009750628471374511
key: test_mcc
value: [0.52204981 0.68850906 0.57881773 0.64901478 0.50128041 0.64285714
0.57735027 0.65814518 0.64285714 0.53605627]
mean value: 0.599693779541295
key: train_mcc
value: [0.60210948 0.6702837 0.66589861 0.59833978 0.6189214 0.59993353
0.65074202 0.63496646 0.6918185 0.59961602]
mean value: 0.6332629515394039
key: test_accuracy
value: [0.75438596 0.84210526 0.78947368 0.8245614 0.75 0.82142857
0.78571429 0.82142857 0.82142857 0.76785714]
mean value: 0.7978383458646616
key: train_accuracy
value: [0.80078895 0.83431953 0.83234714 0.79881657 0.80905512 0.7992126
0.82480315 0.81692913 0.84448819 0.7992126 ]
mean value: 0.8159972976750687
key: test_fscore
value: [0.77419355 0.84745763 0.79310345 0.82758621 0.74074074 0.82142857
0.8 0.83870968 0.82142857 0.77192982]
mean value: 0.8036578216256797
key: train_fscore
value: [0.80539499 0.84030418 0.83685221 0.8030888 0.81381958 0.80608365
0.82982792 0.82217973 0.85122411 0.80534351]
mean value: 0.8214118676278633
key: test_precision
value: [0.70588235 0.80645161 0.79310345 0.82758621 0.76923077 0.82142857
0.75 0.76470588 0.82142857 0.75862069]
mean value: 0.7818438105112842
key: train_precision
value: [0.78867925 0.8125 0.81343284 0.78490566 0.79400749 0.77941176
0.80669145 0.79925651 0.81588448 0.78148148]
mean value: 0.7976250910229972
key: test_recall
value: [0.85714286 0.89285714 0.79310345 0.82758621 0.71428571 0.82142857
0.85714286 0.92857143 0.82142857 0.78571429]
mean value: 0.8299261083743842
key: train_recall
value: [0.82283465 0.87007874 0.86166008 0.82213439 0.83464567 0.83464567
0.85433071 0.84645669 0.88976378 0.83070866]
mean value: 0.8467259033332296
key: test_roc_auc
value: [0.75615764 0.8429803 0.78940887 0.82450739 0.75 0.82142857
0.78571429 0.82142857 0.82142857 0.76785714]
mean value: 0.7980911330049261
key: train_roc_auc
value: [0.80074539 0.83424886 0.83240484 0.79886247 0.80905512 0.7992126
0.82480315 0.81692913 0.84448819 0.7992126 ]
mean value: 0.8159962341663813
key: test_jcc
value: [0.63157895 0.73529412 0.65714286 0.70588235 0.58823529 0.6969697
0.66666667 0.72222222 0.6969697 0.62857143]
mean value: 0.6729533280616872
key: train_jcc
value: [0.67419355 0.72459016 0.71947195 0.67096774 0.68608414 0.67515924
0.70915033 0.69805195 0.74098361 0.67412141]
mean value: 0.6972774066672848
MCC on Blind test: 0.52
Accuracy on Blind test: 0.8
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02162695 0.02330804 0.02636027 0.02295399 0.0236361 0.06173849
0.02118683 0.02854466 0.02622795 0.0239768 ]
mean value: 0.027956008911132812
key: score_time
value: [0.01124668 0.01192689 0.01218081 0.01735902 0.01788497 0.0120585
0.01298404 0.01233673 0.0231607 0.01238585]
mean value: 0.0143524169921875
key: test_mcc
value: [0.86189955 0.82880708 0.86189955 0.89988258 0.79385662 1.
0.89802651 0.89342711 0.80439967 0.93094934]
mean value: 0.8773147998171975
key: train_mcc
value: [0.97636129 0.96450468 0.97239426 0.91875999 0.93470218 0.95322883
0.94970991 0.97649905 0.77972956 0.97250878]
mean value: 0.9398398525421525
key: test_accuracy
value: [0.92982456 0.9122807 0.92982456 0.94736842 0.89285714 1.
0.94642857 0.94642857 0.89285714 0.96428571]
mean value: 0.9362155388471178
key: train_accuracy
value: [0.98816568 0.98224852 0.98619329 0.95857988 0.96653543 0.97637795
0.97440945 0.98818898 0.87992126 0.98622047]
mean value: 0.9686840920032925
key: test_fscore
value: [0.93103448 0.90566038 0.92857143 0.94545455 0.9 1.
0.94915254 0.94736842 0.88 0.96551724]
mean value: 0.9352759038947909
key: train_fscore
value: [0.98823529 0.98224852 0.98619329 0.95723014 0.96749522 0.97674419
0.97495183 0.98809524 0.86474501 0.98630137]
mean value: 0.9672240106699174
key: test_precision
value: [0.9 0.96 0.96296296 1. 0.84375 1.
0.90322581 0.93103448 1. 0.93333333]
mean value: 0.943430658550653
key: train_precision
value: [0.984375 0.98418972 0.98425197 0.98739496 0.94052045 0.96183206
0.95471698 0.996 0.98984772 0.98054475]
mean value: 0.9763673600922473
key: test_recall
value: [0.96428571 0.85714286 0.89655172 0.89655172 0.96428571 1.
1. 0.96428571 0.78571429 1. ]
mean value: 0.9328817733990148
key: train_recall
value: [0.99212598 0.98031496 0.98814229 0.92885375 0.99606299 0.99212598
0.99606299 0.98031496 0.76771654 0.99212598]
mean value: 0.9613846441131617
key: test_roc_auc
value: [0.93041872 0.91133005 0.93041872 0.94827586 0.89285714 1.
0.94642857 0.94642857 0.89285714 0.96428571]
mean value: 0.9363300492610838
key: train_roc_auc
value: [0.98815785 0.98225234 0.98619713 0.95852137 0.96653543 0.97637795
0.97440945 0.98818898 0.87992126 0.98622047]
mean value: 0.9686782235224549
key: test_jcc
value: [0.87096774 0.82758621 0.86666667 0.89655172 0.81818182 1.
0.90322581 0.9 0.78571429 0.93333333]
mean value: 0.8802227583317683
key: train_jcc
value: [0.97674419 0.96511628 0.97276265 0.91796875 0.93703704 0.95454545
0.95112782 0.97647059 0.76171875 0.97297297]
mean value: 0.9386464483370307
MCC on Blind test: 0.74
Accuracy on Blind test: 0.91
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01865149 0.02236271 0.02846265 0.02018642 0.02099609 0.01799417
0.01969862 0.01990604 0.02230525 0.02031541]
mean value: 0.0210878849029541
key: score_time
value: [0.02322531 0.0123446 0.01212907 0.01237273 0.01238465 0.01233506
0.01233649 0.01211786 0.01209664 0.02831006]
mean value: 0.014965248107910157
key: test_mcc
value: [0.56067321 0.82880708 0.66755025 0.93202124 0.71428571 0.74535599
0.60485838 0.85714286 0.96490128 0.93094934]
mean value: 0.7806545352178498
key: train_mcc
value: [0.54270333 0.98028353 0.80481374 0.9417201 0.97244848 0.75996798
0.81248429 0.92727605 0.94217971 0.95349515]
mean value: 0.8637372357336665
key: test_accuracy
value: [0.73684211 0.9122807 0.80701754 0.96491228 0.85714286 0.85714286
0.76785714 0.92857143 0.98214286 0.96428571]
mean value: 0.8778195488721804
key: train_accuracy
value: [0.72781065 0.99013807 0.89546351 0.9704142 0.98622047 0.86811024
0.8976378 0.96259843 0.97047244 0.97637795]
mean value: 0.9245243752814922
key: test_fscore
value: [0.78873239 0.90566038 0.76595745 0.96666667 0.85714286 0.83333333
0.8115942 0.92857143 0.98245614 0.96551724]
mean value: 0.8805632088876222
key: train_fscore
value: [0.78637771 0.99017682 0.88453159 0.97098646 0.98624754 0.8494382
0.90714286 0.96130346 0.97120921 0.97683398]
mean value: 0.9284247832831198
key: test_precision
value: [0.65116279 0.96 1. 0.93548387 0.85714286 1.
0.68292683 0.92857143 0.96551724 0.93333333]
mean value: 0.8914138351360639
key: train_precision
value: [0.64795918 0.98823529 0.98543689 0.95075758 0.98431373 0.9895288
0.83006536 0.99578059 0.94756554 0.95833333]
mean value: 0.9277976294653208
key: test_recall
value: [1. 0.85714286 0.62068966 1. 0.85714286 0.71428571
1. 0.92857143 1. 1. ]
mean value: 0.8977832512315271
key: train_recall
value: [1. 0.99212598 0.80237154 0.99209486 0.98818898 0.74409449
1. 0.92913386 0.99606299 0.99606299]
mean value: 0.9440135694500638
key: test_roc_auc
value: [0.74137931 0.91133005 0.81034483 0.96428571 0.85714286 0.85714286
0.76785714 0.92857143 0.98214286 0.96428571]
mean value: 0.878448275862069
key: train_roc_auc
value: [0.72727273 0.99013414 0.89528026 0.97045688 0.98622047 0.86811024
0.8976378 0.96259843 0.97047244 0.97637795]
mean value: 0.9244561327067319
key: test_jcc
value: [0.65116279 0.82758621 0.62068966 0.93548387 0.75 0.71428571
0.68292683 0.86666667 0.96551724 0.93333333]
mean value: 0.79476523086677
key: train_jcc
value: [0.64795918 0.98054475 0.79296875 0.94360902 0.97286822 0.73828125
0.83006536 0.9254902 0.94402985 0.95471698]
mean value: 0.8730533557799736
MCC on Blind test: 0.62
Accuracy on Blind test: 0.82
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.25352716 0.23959851 0.23895645 0.23683834 0.23459578 0.23570085
0.23769593 0.24037766 0.24266458 0.23892522]
mean value: 0.23988804817199708
key: score_time
value: [0.01613116 0.01571035 0.01589203 0.01570082 0.01559377 0.01582003
0.01564693 0.01657271 0.01562572 0.01605487]
mean value: 0.015874838829040526
key: test_mcc
value: [0.96547546 0.82880708 0.9321832 1. 0.89802651 1.
0.92857143 0.93094934 0.96490128 1. ]
mean value: 0.944891429609529
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.9122807 0.96491228 1. 0.94642857 1.
0.96428571 0.96428571 0.98214286 1. ]
mean value: 0.9716791979949875
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.90566038 0.96428571 1. 0.94915254 1.
0.96428571 0.96551724 0.98245614 1. ]
mean value: 0.971317591185117
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96 1. 1. 0.90322581 1.
0.96428571 0.93333333 0.96551724 1. ]
mean value: 0.9726362095449971
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.85714286 0.93103448 1. 1. 1.
0.96428571 1. 1. 1. ]
mean value: 0.9716748768472906
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.91133005 0.96551724 1. 0.94642857 1.
0.96428571 0.96428571 0.98214286 1. ]
mean value: 0.9716133004926109
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.82758621 0.93103448 1. 0.90322581 1.
0.93103448 0.93333333 0.96551724 1. ]
mean value: 0.9456017267863764
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.88
Accuracy on Blind test: 0.96
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.06545424 0.09734583 0.08027411 0.07696271 0.09823036 0.08253455
0.10306764 0.08882976 0.0632627 0.07575917]
mean value: 0.08317210674285888
key: score_time
value: [0.01944017 0.03101039 0.03404474 0.02634931 0.02568722 0.02294087
0.02933931 0.02022123 0.02197647 0.0271163 ]
mean value: 0.025812602043151854
key: test_mcc
value: [0.96547546 0.79110556 0.89988258 1. 0.79385662 1.
0.85714286 0.93094934 0.93094934 1. ]
mean value: 0.9169361747244729
key: train_mcc
value: [1. 0.99211042 0.98817342 0.98425123 1. 0.99607071
0.99607071 0.98819663 0.99607071 0.98819663]
mean value: 0.9929140477452736
key: test_accuracy
value: [0.98245614 0.89473684 0.94736842 1. 0.89285714 1.
0.92857143 0.96428571 0.96428571 1. ]
mean value: 0.9574561403508772
key: train_accuracy
value: [1. 0.99605523 0.99408284 0.99211045 1. 0.9980315
0.9980315 0.99409449 0.9980315 0.99409449]
mean value: 0.9964531985276989
key: test_fscore
value: [0.98181818 0.88888889 0.94545455 1. 0.9 1.
0.92857143 0.96551724 0.96551724 1. ]
mean value: 0.9575767527491665
key: train_fscore
value: [1. 0.99606299 0.99408284 0.99206349 1. 0.99802761
0.99803536 0.99410609 0.99803536 0.99410609]
mean value: 0.9964519845500475
key: test_precision
value: [1. 0.92307692 1. 1. 0.84375 1.
0.92857143 0.93333333 0.93333333 1. ]
mean value: 0.9562065018315018
key: train_precision
value: [1. 0.99606299 0.99212598 0.99601594 1. 1.
0.99607843 0.99215686 0.99607843 0.99215686]
mean value: 0.9960675500868227
key: test_recall
value: [0.96428571 0.85714286 0.89655172 1. 0.96428571 1.
0.92857143 1. 1. 1. ]
mean value: 0.9610837438423645
key: train_recall
value: [1. 0.99606299 0.99604743 0.98814229 1. 0.99606299
1. 0.99606299 1. 0.99606299]
mean value: 0.9968441691824095
key: test_roc_auc
value: [0.98214286 0.89408867 0.94827586 1. 0.89285714 1.
0.92857143 0.96428571 0.96428571 1. ]
mean value: 0.9574507389162562
key: train_roc_auc
value: [1. 0.99605521 0.99408671 0.99210264 1. 0.9980315
0.9980315 0.99409449 0.9980315 0.99409449]
mean value: 0.9964528025893996
key: test_jcc
value: [0.96428571 0.8 0.89655172 1. 0.81818182 1.
0.86666667 0.93333333 0.93333333 1. ]
mean value: 0.9212352589938797
key: train_jcc
value: [1. 0.99215686 0.98823529 0.98425197 1. 0.99606299
0.99607843 0.98828125 0.99607843 0.98828125]
mean value: 0.9929426480237764
MCC on Blind test: 0.87
Accuracy on Blind test: 0.96
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.24165225 0.2639544 0.21907234 0.1663208 0.1660831 0.18476343
0.18354225 0.15284801 0.23790598 0.2857213 ]
mean value: 0.21018638610839843
key: score_time
value: [0.02633309 0.04548001 0.03008986 0.05320883 0.0402391 0.03054214
0.01558995 0.02765036 0.02605081 0.04492259]
mean value: 0.034010672569274904
key: test_mcc
value: [0.5149026 0.78940887 0.51851399 0.79778885 0.75047877 0.78571429
0.61706091 0.89802651 0.79385662 0.75047877]
mean value: 0.7216230189409703
key: train_mcc
value: [0.99606293 0.98817342 0.98817323 0.98425123 0.98819663 0.99607071
0.99212598 0.98819663 0.99212598 0.98819663]
mean value: 0.9901573396651306
key: test_accuracy
value: [0.75438596 0.89473684 0.75438596 0.89473684 0.875 0.89285714
0.80357143 0.94642857 0.89285714 0.875 ]
mean value: 0.8583959899749374
key: train_accuracy
value: [0.99802761 0.99408284 0.99408284 0.99211045 0.99409449 0.9980315
0.99606299 0.99409449 0.99606299 0.99409449]
mean value: 0.9950744692416407
key: test_fscore
value: [0.76666667 0.89285714 0.78125 0.88888889 0.87719298 0.89285714
0.81967213 0.94915254 0.88461538 0.87272727]
mean value: 0.8625880154589062
key: train_fscore
value: [0.99803536 0.99408284 0.99405941 0.99206349 0.99408284 0.99803536
0.99606299 0.99408284 0.99606299 0.99408284]
mean value: 0.9950650970118321
key: test_precision
value: [0.71875 0.89285714 0.71428571 0.96 0.86206897 0.89285714
0.75757576 0.90322581 0.95833333 0.88888889]
mean value: 0.8548842751766834
key: train_precision
value: [0.99607843 0.99604743 0.99603175 0.99601594 0.99604743 0.99607843
0.99606299 0.99604743 0.99606299 0.99604743]
mean value: 0.996052025260395
key: test_recall
value: [0.82142857 0.89285714 0.86206897 0.82758621 0.89285714 0.89285714
0.89285714 1. 0.82142857 0.85714286]
mean value: 0.8761083743842365
key: train_recall
value: [1. 0.99212598 0.99209486 0.98814229 0.99212598 1.
0.99606299 0.99212598 0.99606299 0.99212598]
mean value: 0.994086707541004
key: test_roc_auc
value: [0.75554187 0.89470443 0.75246305 0.89593596 0.875 0.89285714
0.80357143 0.94642857 0.89285714 0.875 ]
mean value: 0.858435960591133
key: train_roc_auc
value: [0.99802372 0.99408671 0.99407893 0.99210264 0.99409449 0.9980315
0.99606299 0.99409449 0.99606299 0.99409449]
mean value: 0.9950732937038996
key: test_jcc
value: [0.62162162 0.80645161 0.64102564 0.8 0.78125 0.80645161
0.69444444 0.90322581 0.79310345 0.77419355]
mean value: 0.762176773601273
key: train_jcc
value: [0.99607843 0.98823529 0.98818898 0.98425197 0.98823529 0.99607843
0.99215686 0.98823529 0.99215686 0.98823529]
mean value: 0.9901852709587773
MCC on Blind test: 0.47
Accuracy on Blind test: 0.82
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.97148848 0.95016146 0.97649026 0.94490004 0.95638227 0.95632815
0.95845008 0.95048833 0.95981479 0.96253824]
mean value: 0.9587042093276977
key: score_time
value: [0.00969386 0.00933099 0.00937676 0.00944138 0.00976372 0.00949907
0.00940275 0.00956464 0.009552 0.00934672]
mean value: 0.00949718952178955
key: test_mcc
value: [0.93202124 0.82880708 0.9321832 1. 0.8660254 1.
0.89342711 0.93094934 0.96490128 0.96490128]
mean value: 0.9313215939799506
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.9122807 0.96491228 1. 0.92857143 1.
0.94642857 0.96428571 0.98214286 0.98214286]
mean value: 0.9645676691729324
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96296296 0.90566038 0.96428571 1. 0.93333333 1.
0.94545455 0.96551724 0.98245614 0.98181818]
mean value: 0.9641488496943416
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96 1. 1. 0.875 1.
0.96296296 0.93333333 0.96551724 1. ]
mean value: 0.9696813537675607
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92857143 0.85714286 0.93103448 1. 1. 1.
0.92857143 1. 1. 0.96428571]
mean value: 0.9609605911330049
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96428571 0.91133005 0.96551724 1. 0.92857143 1.
0.94642857 0.96428571 0.98214286 0.98214286]
mean value: 0.9644704433497537
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92857143 0.82758621 0.93103448 1. 0.875 1.
0.89655172 0.93333333 0.96551724 0.96428571]
mean value: 0.932188013136289
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.96
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03023362 0.03938198 0.03128219 0.03190351 0.03164005 0.03126192
0.03192115 0.03110099 0.03161693 0.0320549 ]
mean value: 0.03223972320556641
key: score_time
value: [0.01244426 0.01752877 0.01386118 0.01407957 0.01385164 0.01395226
0.01401639 0.01404047 0.01423621 0.01403999]
mean value: 0.014205074310302735
key: test_mcc
value: [0.8951918 0.8951918 0.93202124 1. 0.96490128 0.93094934
0.96490128 0.96490128 0.96490128 1. ]
mean value: 0.9512959308288262
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.94736842 0.96491228 1. 0.98214286 0.96428571
0.98214286 0.98214286 0.98214286 1. ]
mean value: 0.975250626566416
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94545455 0.94545455 0.96666667 1. 0.98245614 0.96551724
0.98245614 0.98245614 0.98245614 1. ]
mean value: 0.9752917560358576
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96296296 0.96296296 0.93548387 1. 0.96551724 0.93333333
0.96551724 0.96551724 0.96551724 1. ]
mean value: 0.9656812095744243
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92857143 0.92857143 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9857142857142858
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94704433 0.94704433 0.96428571 1. 0.98214286 0.96428571
0.98214286 0.98214286 0.98214286 1. ]
mean value: 0.9751231527093597
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.89655172 0.89655172 0.93548387 1. 0.96551724 0.93333333
0.96551724 0.96551724 0.96551724 1. ]
mean value: 0.9523989618094179
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.1
Accuracy on Blind test: 0.76
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02729225 0.03505087 0.03882575 0.04039145 0.03900075 0.03893924
0.03912377 0.03932238 0.03904605 0.03894711]
mean value: 0.037593960762023926
key: score_time
value: [0.01924682 0.02053595 0.01905107 0.0189836 0.01907921 0.01899457
0.01899743 0.01894832 0.01898527 0.01905417]
mean value: 0.01918764114379883
key: test_mcc
value: [0.9321832 0.8951918 0.92980296 1. 0.85933785 1.
0.85933785 0.85933785 0.96490128 0.93094934]
mean value: 0.9231042121808317
key: train_mcc
value: [0.96450413 0.97239383 0.96847232 0.96847232 0.97244848 0.9645744
0.97244848 0.97637795 0.9645744 0.9645744 ]
mean value: 0.9688840740520619
key: test_accuracy
value: [0.96491228 0.94736842 0.96491228 1. 0.92857143 1.
0.92857143 0.92857143 0.98214286 0.96428571]
mean value: 0.9609335839598998
key: train_accuracy
value: [0.98224852 0.98619329 0.98422091 0.98422091 0.98622047 0.98228346
0.98622047 0.98818898 0.98228346 0.98228346]
mean value: 0.9844363944151951
key: test_fscore
value: [0.96551724 0.94545455 0.96551724 1. 0.93103448 1.
0.93103448 0.93103448 0.98181818 0.96551724]
mean value: 0.961692789968652
key: train_fscore
value: [0.98231827 0.98624754 0.98425197 0.98425197 0.98624754 0.98231827
0.98624754 0.98818898 0.98231827 0.98231827]
mean value: 0.9844708630478165
key: test_precision
value: [0.93333333 0.96296296 0.96551724 1. 0.9 1.
0.9 0.9 1. 0.93333333]
mean value: 0.949514687100894
key: train_precision
value: [0.98039216 0.98431373 0.98039216 0.98039216 0.98431373 0.98039216
0.98431373 0.98818898 0.98039216 0.98039216]
mean value: 0.9823483094025012
key: test_recall
value: [1. 0.92857143 0.96551724 1. 0.96428571 1.
0.96428571 0.96428571 0.96428571 1. ]
mean value: 0.9751231527093596
key: train_recall
value: [0.98425197 0.98818898 0.98814229 0.98814229 0.98818898 0.98425197
0.98818898 0.98818898 0.98425197 0.98425197]
mean value: 0.9866048364507797
key: test_roc_auc
value: [0.96551724 0.94704433 0.96490148 1. 0.92857143 1.
0.92857143 0.92857143 0.98214286 0.96428571]
mean value: 0.960960591133005
key: train_roc_auc
value: [0.98224456 0.98618935 0.98422863 0.98422863 0.98622047 0.98228346
0.98622047 0.98818898 0.98228346 0.98228346]
mean value: 0.9844371479256793
key: test_jcc
value: [0.93333333 0.89655172 0.93333333 1. 0.87096774 1.
0.87096774 0.87096774 0.96428571 0.93333333]
mean value: 0.9273740664230097
key: train_jcc
value: [0.96525097 0.97286822 0.96899225 0.96899225 0.97286822 0.96525097
0.97286822 0.9766537 0.96525097 0.96525097]
mean value: 0.9694246704788737
MCC on Blind test: 0.81
Accuracy on Blind test: 0.93
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.37074566 0.28954816 0.30450511 0.2897861 0.29390836 0.33179474
0.34224701 0.31382322 0.29044604 0.29306006]
mean value: 0.31198644638061523
key: score_time
value: [0.01924872 0.01909328 0.01921964 0.01943707 0.01910305 0.01929712
0.01911283 0.01910567 0.01906586 0.01909971]
mean value: 0.019178295135498048
key: test_mcc
value: [0.9321832 0.8951918 0.92980296 1. 0.85933785 1.
0.85933785 0.85933785 0.96490128 0.93094934]
mean value: 0.9231042121808317
key: train_mcc
value: [0.96450413 0.97239383 0.96847232 0.96847232 0.97244848 0.9645744
0.97244848 0.97637795 0.9645744 0.9645744 ]
mean value: 0.9688840740520619
key: test_accuracy
value: [0.96491228 0.94736842 0.96491228 1. 0.92857143 1.
0.92857143 0.92857143 0.98214286 0.96428571]
mean value: 0.9609335839598998
key: train_accuracy
value: [0.98224852 0.98619329 0.98422091 0.98422091 0.98622047 0.98228346
0.98622047 0.98818898 0.98228346 0.98228346]
mean value: 0.9844363944151951
key: test_fscore
value: [0.96551724 0.94545455 0.96551724 1. 0.93103448 1.
0.93103448 0.93103448 0.98181818 0.96551724]
mean value: 0.961692789968652
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:128: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:131: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.98231827 0.98624754 0.98425197 0.98425197 0.98624754 0.98231827
0.98624754 0.98818898 0.98231827 0.98231827]
mean value: 0.9844708630478165
key: test_precision
value: [0.93333333 0.96296296 0.96551724 1. 0.9 1.
0.9 0.9 1. 0.93333333]
mean value: 0.949514687100894
key: train_precision
value: [0.98039216 0.98431373 0.98039216 0.98039216 0.98431373 0.98039216
0.98431373 0.98818898 0.98039216 0.98039216]
mean value: 0.9823483094025012
key: test_recall
value: [1. 0.92857143 0.96551724 1. 0.96428571 1.
0.96428571 0.96428571 0.96428571 1. ]
mean value: 0.9751231527093596
key: train_recall
value: [0.98425197 0.98818898 0.98814229 0.98814229 0.98818898 0.98425197
0.98818898 0.98818898 0.98425197 0.98425197]
mean value: 0.9866048364507797
key: test_roc_auc
value: [0.96551724 0.94704433 0.96490148 1. 0.92857143 1.
0.92857143 0.92857143 0.98214286 0.96428571]
mean value: 0.960960591133005
key: train_roc_auc
value: [0.98224456 0.98618935 0.98422863 0.98422863 0.98622047 0.98228346
0.98622047 0.98818898 0.98228346 0.98228346]
mean value: 0.9844371479256793
key: test_jcc
value: [0.93333333 0.89655172 0.93333333 1. 0.87096774 1.
0.87096774 0.87096774 0.96428571 0.93333333]
mean value: 0.9273740664230097
key: train_jcc
value: [0.96525097 0.97286822 0.96899225 0.96899225 0.97286822 0.96525097
0.97286822 0.9766537 0.96525097 0.96525097]
mean value: 0.9694246704788737
MCC on Blind test: 0.81
Accuracy on Blind test: 0.93
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03503633 0.04945874 0.03754044 0.03722906 0.03705621 0.03847361
0.03858423 0.03832555 0.03757358 0.03789806]
mean value: 0.03871757984161377
key: score_time
value: [0.01331329 0.01399422 0.01336384 0.01483345 0.01530671 0.01369762
0.01375341 0.01342893 0.01370549 0.01364231]
mean value: 0.013903927803039551
key: test_mcc
value: [0.83797038 0.82512315 0.82880708 0.9321832 0.71611487 0.89342711
0.79385662 0.85933785 0.71428571 0.96490128]
mean value: 0.8366007266767884
key: train_mcc
value: [0.90933143 0.9172256 0.89754406 0.90144111 0.91732994 0.90945587
0.90158179 0.91738682 0.90951226 0.89766562]
mean value: 0.9078474512102387
key: test_accuracy
value: [0.9122807 0.9122807 0.9122807 0.96491228 0.85714286 0.94642857
0.89285714 0.92857143 0.85714286 0.98214286]
mean value: 0.9166040100250626
key: train_accuracy
value: [0.95463511 0.95857988 0.94871795 0.95069034 0.95866142 0.95472441
0.9507874 0.95866142 0.95472441 0.9488189 ]
mean value: 0.953900122691764
key: test_fscore
value: [0.91803279 0.9122807 0.91803279 0.96428571 0.86206897 0.94545455
0.9 0.93103448 0.85714286 0.98245614]
mean value: 0.9190788981034733
key: train_fscore
value: [0.95499022 0.95841584 0.94820717 0.95029821 0.95857988 0.95463511
0.95069034 0.95841584 0.95499022 0.9486166 ]
mean value: 0.9537839421981321
key: test_precision
value: [0.84848485 0.89655172 0.875 1. 0.83333333 0.96296296
0.84375 0.9 0.85714286 0.96551724]
mean value: 0.8982742967441243
key: train_precision
value: [0.94941634 0.96414343 0.95582329 0.956 0.96047431 0.95652174
0.95256917 0.96414343 0.94941634 0.95238095]
mean value: 0.9560889000359492
key: test_recall
value: [1. 0.92857143 0.96551724 0.93103448 0.89285714 0.92857143
0.96428571 0.96428571 0.85714286 1. ]
mean value: 0.9432266009852217
key: train_recall
value: [0.96062992 0.95275591 0.94071146 0.94466403 0.95669291 0.95275591
0.9488189 0.95275591 0.96062992 0.94488189]
mean value: 0.9515296753913666
key: test_roc_auc
value: [0.9137931 0.91256158 0.91133005 0.96551724 0.85714286 0.94642857
0.89285714 0.92857143 0.85714286 0.98214286]
mean value: 0.9167487684729064
key: train_roc_auc
value: [0.95462326 0.95859139 0.94870219 0.95067847 0.95866142 0.95472441
0.9507874 0.95866142 0.95472441 0.9488189 ]
mean value: 0.9538973265693567
key: test_jcc
value: [0.84848485 0.83870968 0.84848485 0.93103448 0.75757576 0.89655172
0.81818182 0.87096774 0.75 0.96551724]
mean value: 0.8525508140357974
key: train_jcc
value: [0.91385768 0.92015209 0.90151515 0.90530303 0.92045455 0.91320755
0.90601504 0.92015209 0.91385768 0.90225564]
mean value: 0.9116770489449016
MCC on Blind test: 0.69
Accuracy on Blind test: 0.88
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.06151056 0.97917652 1.15594339 1.03205562 1.13682246 1.09652424
1.14049935 0.84171343 1.05300856 0.92840409]
mean value: 1.0425658226013184
key: score_time
value: [0.01576757 0.01351643 0.01401377 0.01967168 0.01395941 0.02089858
0.01919866 0.01392365 0.01353335 0.01369238]
mean value: 0.015817546844482423
key: test_mcc
value: [0.89988258 0.82512315 0.93202124 0.93202124 0.82618439 0.93094934
0.89802651 0.89802651 0.89342711 0.96490128]
mean value: 0.900056335305218
key: train_mcc
value: [0.99211042 0.98028384 0.98817342 0.99211042 1. 0.98819663
1. 1. 0.99212598 0.99212598]
mean value: 0.9925126702482893
key: test_accuracy
value: [0.94736842 0.9122807 0.96491228 0.96491228 0.91071429 0.96428571
0.94642857 0.94642857 0.94642857 0.98214286]
mean value: 0.9485902255639097
key: train_accuracy
value: [0.99605523 0.99013807 0.99408284 0.99605523 1. 0.99409449
1. 1. 0.99606299 0.99606299]
mean value: 0.9962551833387691
key: test_fscore
value: [0.94915254 0.9122807 0.96666667 0.96666667 0.91525424 0.96296296
0.94915254 0.94915254 0.94736842 0.98245614]
mean value: 0.9501113423860971
key: train_fscore
value: [0.99606299 0.99013807 0.99408284 0.99604743 1. 0.99410609
1. 1. 0.99606299 0.99606299]
mean value: 0.9962563404879103
key: test_precision
value: [0.90322581 0.89655172 0.93548387 0.93548387 0.87096774 1.
0.90322581 0.90322581 0.93103448 0.96551724]
mean value: 0.9244716351501668
key: train_precision
value: [0.99606299 0.99209486 0.99212598 0.99604743 1. 0.99215686
1. 1. 0.99606299 0.99606299]
mean value: 0.9960614115865138
key: test_recall
value: [1. 0.92857143 1. 1. 0.96428571 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.9785714285714285
key: train_recall
value: [0.99606299 0.98818898 0.99604743 0.99604743 1. 0.99606299
1. 1. 0.99606299 0.99606299]
mean value: 0.9964535806541969
key: test_roc_auc
value: [0.94827586 0.91256158 0.96428571 0.96428571 0.91071429 0.96428571
0.94642857 0.94642857 0.94642857 0.98214286]
mean value: 0.9485837438423645
key: train_roc_auc
value: [0.99605521 0.99014192 0.99408671 0.99605521 1. 0.99409449
1. 1. 0.99606299 0.99606299]
mean value: 0.9962559521956988
key: test_jcc
value: [0.90322581 0.83870968 0.93548387 0.93548387 0.84375 0.92857143
0.90322581 0.90322581 0.9 0.96551724]
mean value: 0.9057193508660416
key: train_jcc
value: [0.99215686 0.98046875 0.98823529 0.99212598 1. 0.98828125
1. 1. 0.99215686 0.99215686]
mean value: 0.992558186660491
MCC on Blind test: 0.77
Accuracy on Blind test: 0.92
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01462245 0.01250768 0.01081085 0.01020527 0.01017833 0.01032519
0.01021409 0.01031613 0.01019001 0.01003003]
mean value: 0.010940003395080566
key: score_time
value: [0.01250744 0.00965738 0.00924444 0.00913548 0.00906038 0.00908661
0.00905704 0.00912213 0.00909281 0.00910807]
mean value: 0.009507179260253906
key: test_mcc
value: [0.59358067 0.6166424 0.75492611 0.58562417 0.53881591 0.71611487
0.65814518 0.5728919 0.72168784 0.78571429]
mean value: 0.6544143324803186
key: train_mcc
value: [0.73570695 0.73999638 0.7556462 0.70845665 0.69724436 0.72930229
0.77174925 0.73248786 0.78860037 0.67735436]
mean value: 0.7336544661308108
key: test_accuracy
value: [0.78947368 0.80701754 0.87719298 0.78947368 0.76785714 0.85714286
0.82142857 0.78571429 0.85714286 0.89285714]
mean value: 0.82453007518797
key: train_accuracy
value: [0.8678501 0.86982249 0.87771203 0.85404339 0.84448819 0.86417323
0.88582677 0.86614173 0.89370079 0.83858268]
mean value: 0.8662341393716319
key: test_fscore
value: [0.80645161 0.79245283 0.87719298 0.77777778 0.75471698 0.86206897
0.83870968 0.77777778 0.84615385 0.89285714]
mean value: 0.8226159594183262
key: train_fscore
value: [0.8678501 0.87209302 0.87890625 0.85603113 0.8315565 0.86756238
0.88671875 0.86770428 0.89655172 0.84046693]
mean value: 0.8665441063880106
key: test_precision
value: [0.73529412 0.84 0.89285714 0.84 0.8 0.83333333
0.76470588 0.80769231 0.91666667 0.89285714]
mean value: 0.8323406593406594
key: train_precision
value: [0.86956522 0.85877863 0.86872587 0.84291188 0.90697674 0.84644195
0.87984496 0.85769231 0.87313433 0.83076923]
mean value: 0.8634841109277654
key: test_recall
value: [0.89285714 0.75 0.86206897 0.72413793 0.71428571 0.89285714
0.92857143 0.75 0.78571429 0.89285714]
mean value: 0.8193349753694581
key: train_recall
value: [0.86614173 0.88582677 0.88932806 0.86956522 0.76771654 0.88976378
0.89370079 0.87795276 0.92125984 0.8503937 ]
mean value: 0.8711649186144222
key: test_roc_auc
value: [0.79125616 0.80603448 0.87746305 0.79064039 0.76785714 0.85714286
0.82142857 0.78571429 0.85714286 0.89285714]
mean value: 0.8247536945812808
key: train_roc_auc
value: [0.86785347 0.86979086 0.8777349 0.85407395 0.84448819 0.86417323
0.88582677 0.86614173 0.89370079 0.83858268]
mean value: 0.8662366561887274
key: test_jcc
value: [0.67567568 0.65625 0.78125 0.63636364 0.60606061 0.75757576
0.72222222 0.63636364 0.73333333 0.80645161]
mean value: 0.7011546480498093
key: train_jcc
value: [0.76655052 0.77319588 0.78397213 0.74829932 0.71167883 0.76610169
0.79649123 0.76632302 0.8125 0.72483221]
mean value: 0.7649944838022477
MCC on Blind test: 0.43
Accuracy on Blind test: 0.78
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01057291 0.01041102 0.01036787 0.01058412 0.01035976 0.01047611
0.01075315 0.01133537 0.01048946 0.01053786]
mean value: 0.010588765144348145
key: score_time
value: [0.00910211 0.00900173 0.00892186 0.0090301 0.00900674 0.00920486
0.00922561 0.00952029 0.00915504 0.00927591]
mean value: 0.009144425392150879
key: test_mcc
value: [0.80817326 0.57973205 0.43842365 0.43842365 0.57735027 0.64285714
0.58501794 0.64285714 0.47187011 0.53605627]
mean value: 0.5720761462070288
key: train_mcc
value: [0.59369456 0.64499463 0.64499463 0.63709364 0.62999938 0.56756289
0.64173726 0.60292787 0.63779528 0.59849942]
mean value: 0.6199299551759412
key: test_accuracy
value: [0.89473684 0.78947368 0.71929825 0.71929825 0.78571429 0.82142857
0.78571429 0.82142857 0.73214286 0.76785714]
mean value: 0.7837092731829574
key: train_accuracy
value: [0.79684418 0.82248521 0.82248521 0.81854043 0.81496063 0.78346457
0.82086614 0.8011811 0.81889764 0.7992126 ]
mean value: 0.8098937706751153
key: test_fscore
value: [0.90322581 0.77777778 0.72413793 0.72413793 0.76923077 0.82142857
0.80645161 0.82142857 0.70588235 0.76363636]
mean value: 0.7817337687867034
key: train_fscore
value: [0.79684418 0.82213439 0.82283465 0.81746032 0.81640625 0.77822581
0.82121807 0.79678068 0.81889764 0.79761905]
mean value: 0.8088421032567705
key: test_precision
value: [0.82352941 0.80769231 0.72413793 0.72413793 0.83333333 0.82142857
0.73529412 0.82142857 0.7826087 0.77777778]
mean value: 0.7851368648793466
key: train_precision
value: [0.79841897 0.82539683 0.81960784 0.82071713 0.81007752 0.79752066
0.81960784 0.81481481 0.81889764 0.804 ]
mean value: 0.8129059248624415
key: test_recall
value: [1. 0.75 0.72413793 0.72413793 0.71428571 0.82142857
0.89285714 0.82142857 0.64285714 0.75 ]
mean value: 0.7841133004926109
key: train_recall
value: [0.79527559 0.81889764 0.82608696 0.81422925 0.82283465 0.75984252
0.82283465 0.77952756 0.81889764 0.79133858]
mean value: 0.8049765024431235
key: test_roc_auc
value: [0.89655172 0.7887931 0.71921182 0.71921182 0.78571429 0.82142857
0.78571429 0.82142857 0.73214286 0.76785714]
mean value: 0.7838054187192118
key: train_roc_auc
value: [0.79684728 0.8224923 0.8224923 0.81853195 0.81496063 0.78346457
0.82086614 0.8011811 0.81889764 0.7992126 ]
mean value: 0.8098946500264542
key: test_jcc
value: [0.82352941 0.63636364 0.56756757 0.56756757 0.625 0.6969697
0.67567568 0.6969697 0.54545455 0.61764706]
mean value: 0.6452744857156621
key: train_jcc
value: [0.66229508 0.69798658 0.69899666 0.69127517 0.68976898 0.6369637
0.69666667 0.66220736 0.69333333 0.66336634]
mean value: 0.6792859850212573
MCC on Blind test: 0.2
Accuracy on Blind test: 0.69
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01003718 0.01101255 0.01126194 0.01116419 0.01099014 0.01097584
0.01107812 0.01125479 0.01113129 0.01112223]
mean value: 0.011002826690673827
key: score_time
value: [0.01278234 0.0125072 0.01366591 0.01346803 0.01265216 0.01326346
0.0132699 0.01311994 0.01899934 0.01349974]
mean value: 0.013722801208496093
key: test_mcc
value: [0.62036458 0.54377353 0.45409716 0.66268617 0.57735027 0.68965631
0.53881591 0.68250015 0.46697379 0.60753044]
mean value: 0.5843748306991364
key: train_mcc
value: [0.76941166 0.76071428 0.79980738 0.76082422 0.79356189 0.79286644
0.77572829 0.76054069 0.74294954 0.78364389]
mean value: 0.7740048286091203
key: test_accuracy
value: [0.78947368 0.77192982 0.71929825 0.8245614 0.78571429 0.83928571
0.76785714 0.83928571 0.73214286 0.80357143]
mean value: 0.787312030075188
key: train_accuracy
value: [0.8816568 0.87771203 0.8974359 0.87771203 0.89566929 0.89173228
0.88582677 0.87795276 0.87007874 0.88976378]
mean value: 0.884554038733324
key: test_fscore
value: [0.81818182 0.76363636 0.75757576 0.84375 0.8 0.85245902
0.77966102 0.84745763 0.71698113 0.8 ]
mean value: 0.797970273193065
key: train_fscore
value: [0.88888889 0.88475836 0.90262172 0.88432836 0.89943074 0.89945155
0.89138577 0.88432836 0.8754717 0.89513109]
mean value: 0.8905796538479781
key: test_precision
value: [0.71052632 0.77777778 0.67567568 0.77142857 0.75 0.78787879
0.74193548 0.80645161 0.76 0.81481481]
mean value: 0.7596489040139295
key: train_precision
value: [0.83916084 0.83802817 0.85765125 0.83745583 0.86813187 0.83959044
0.85 0.84042553 0.84057971 0.85357143]
mean value: 0.8464595066564342
key: test_recall
value: [0.96428571 0.75 0.86206897 0.93103448 0.85714286 0.92857143
0.82142857 0.89285714 0.67857143 0.78571429]
mean value: 0.847167487684729
key: train_recall
value: [0.94488189 0.93700787 0.95256917 0.93675889 0.93307087 0.96850394
0.93700787 0.93307087 0.91338583 0.94094488]
mean value: 0.9397202078989139
key: test_roc_auc
value: [0.79248768 0.77155172 0.71674877 0.8226601 0.78571429 0.83928571
0.76785714 0.83928571 0.73214286 0.80357143]
mean value: 0.7871305418719212
key: train_roc_auc
value: [0.88153185 0.87759485 0.89754443 0.87782827 0.89566929 0.89173228
0.88582677 0.87795276 0.87007874 0.88976378]
mean value: 0.8845523015156702
key: test_jcc
value: [0.69230769 0.61764706 0.6097561 0.72972973 0.66666667 0.74285714
0.63888889 0.73529412 0.55882353 0.66666667]
mean value: 0.6658637590560116
key: train_jcc
value: [0.8 0.79333333 0.8225256 0.79264214 0.81724138 0.81727575
0.80405405 0.79264214 0.77852349 0.81016949]
mean value: 0.8028407373870428
MCC on Blind test: 0.24
Accuracy on Blind test: 0.7
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02933192 0.02366972 0.0223372 0.02245879 0.02203465 0.02228165
0.02257109 0.02298236 0.02250648 0.02265596]
mean value: 0.023282980918884276
key: score_time
value: [0.01401258 0.01203036 0.01217723 0.01243663 0.01200104 0.01210237
0.01222324 0.0121057 0.01228499 0.01206851]
mean value: 0.012344264984130859
key: test_mcc
value: [0.75047877 0.7589669 0.7257422 0.96551724 0.71428571 0.89342711
0.73127242 0.68250015 0.64450339 0.93094934]
mean value: 0.7797643240554251
key: train_mcc
value: [0.83222561 0.85928385 0.83474492 0.8364528 0.84004879 0.88213591
0.84756752 0.84293789 0.84004879 0.85869374]
mean value: 0.8474139833983546
key: test_accuracy
value: [0.85964912 0.87719298 0.85964912 0.98245614 0.85714286 0.94642857
0.85714286 0.83928571 0.82142857 0.96428571]
mean value: 0.8864661654135338
key: train_accuracy
value: [0.91518738 0.92899408 0.91715976 0.91715976 0.91929134 0.94094488
0.92322835 0.92125984 0.91929134 0.92913386]
mean value: 0.9231650592492506
key: test_fscore
value: [0.875 0.88135593 0.87096774 0.98245614 0.85714286 0.94545455
0.87096774 0.84745763 0.81481481 0.96551724]
mean value: 0.8911134642335407
key: train_fscore
value: [0.91809524 0.93103448 0.91828794 0.91984733 0.92160612 0.94163424
0.92514395 0.92248062 0.92160612 0.93023256]
mean value: 0.9249968597409465
key: test_precision
value: [0.77777778 0.83870968 0.81818182 1. 0.85714286 0.96296296
0.79411765 0.80645161 0.84615385 0.93333333]
mean value: 0.8634831532934
key: train_precision
value: [0.88929889 0.90671642 0.90421456 0.88929889 0.89591078 0.93076923
0.90262172 0.90839695 0.89591078 0.91603053]
mean value: 0.9039168759145274
key: test_recall
value: [1. 0.92857143 0.93103448 0.96551724 0.85714286 0.92857143
0.96428571 0.89285714 0.78571429 1. ]
mean value: 0.9253694581280788
key: train_recall
value: [0.9488189 0.95669291 0.93280632 0.95256917 0.9488189 0.95275591
0.9488189 0.93700787 0.9488189 0.94488189]
mean value: 0.9471989667299493
key: test_roc_auc
value: [0.86206897 0.87807882 0.85837438 0.98275862 0.85714286 0.94642857
0.85714286 0.83928571 0.82142857 0.96428571]
mean value: 0.8866995073891626
key: train_roc_auc
value: [0.91512091 0.92893934 0.91719056 0.91722947 0.91929134 0.94094488
0.92322835 0.92125984 0.91929134 0.92913386]
mean value: 0.9231629890137251
key: test_jcc
value: [0.77777778 0.78787879 0.77142857 0.96551724 0.75 0.89655172
0.77142857 0.73529412 0.6875 0.93333333]
mean value: 0.8076710125011343
key: train_jcc
value: [0.84859155 0.87096774 0.84892086 0.85159011 0.85460993 0.88970588
0.86071429 0.85611511 0.85460993 0.86956522]
mean value: 0.8605390612075907
MCC on Blind test: 0.66
Accuracy on Blind test: 0.88
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.33146954 2.55629253 2.25351214 2.16796517 2.04874682 2.0742538
2.40498137 2.58180976 2.5396452 2.15567756]
mean value: 2.3114353895187376
key: score_time
value: [0.01615882 0.01403785 0.01418424 0.026016 0.01434445 0.01499295
0.03440094 0.01265144 0.01818609 0.02401257]
mean value: 0.018898534774780273
key: test_mcc
value: [0.9321832 0.89988258 0.93202124 0.96547546 0.85933785 0.89342711
0.89802651 0.89802651 0.96490128 0.96490128]
mean value: 0.9208183019462288
key: train_mcc
value: [0.99606293 0.99606293 0.99606299 0.99606299 1. 0.99607071
0.99212598 1. 0.99212598 0.99607071]
mean value: 0.9960645238155992
key: test_accuracy
value: [0.96491228 0.94736842 0.96491228 0.98245614 0.92857143 0.94642857
0.94642857 0.94642857 0.98214286 0.98214286]
mean value: 0.9591791979949874
key: train_accuracy
value: [0.99802761 0.99802761 0.99802761 0.99802761 1. 0.9980315
0.99606299 1. 0.99606299 0.9980315 ]
mean value: 0.9980299430026868
key: test_fscore
value: [0.96551724 0.94915254 0.96666667 0.98305085 0.93103448 0.94545455
0.94915254 0.94915254 0.98181818 0.98245614]
mean value: 0.9603455733004473
key: train_fscore
value: [0.99803536 0.99803536 0.99802761 0.99802761 1. 0.99803536
0.99606299 1. 0.99606299 0.99803536]
mean value: 0.9980322664907467
key: test_precision
value: [0.93333333 0.90322581 0.93548387 0.96666667 0.9 0.96296296
0.90322581 0.90322581 1. 0.96551724]
mean value: 0.9373641494664854
key: train_precision
value: [0.99607843 0.99607843 0.99606299 0.99606299 1. 0.99607843
0.99606299 1. 0.99606299 0.99607843]
mean value: 0.9968565693994134
key: test_recall
value: [1. 1. 1. 1. 0.96428571 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.9857142857142858
key: train_recall
value: [1. 1. 1. 1. 1. 1.
0.99606299 1. 0.99606299 1. ]
mean value: 0.9992125984251968
key: test_roc_auc
value: [0.96551724 0.94827586 0.96428571 0.98214286 0.92857143 0.94642857
0.94642857 0.94642857 0.98214286 0.98214286]
mean value: 0.9592364532019705
key: train_roc_auc
value: [0.99802372 0.99802372 0.9980315 0.9980315 1. 0.9980315
0.99606299 1. 0.99606299 0.9980315 ]
mean value: 0.9980299399333976
key: test_jcc
value: [0.93333333 0.90322581 0.93548387 0.96666667 0.87096774 0.89655172
0.90322581 0.90322581 0.96428571 0.96551724]
mean value: 0.924248371206102
key: train_jcc
value: [0.99607843 0.99607843 0.99606299 0.99606299 1. 0.99607843
0.99215686 1. 0.99215686 0.99607843]
mean value: 0.9960753435232361
MCC on Blind test: 0.72
Accuracy on Blind test: 0.9
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.03225613 0.02523661 0.02279449 0.02276397 0.02029443 0.02316093
0.02214956 0.02145815 0.02218223 0.02365208]
mean value: 0.02359485626220703
key: score_time
value: [0.01220226 0.00912762 0.00895953 0.00886941 0.00896859 0.00906873
0.00915599 0.00904799 0.00930071 0.00928521]
mean value: 0.009398603439331054
key: test_mcc
value: [1. 0.9321832 1. 1. 0.73127242 1.
0.89802651 0.93094934 0.92857143 1. ]
mean value: 0.9421002898482495
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.96491228 1. 1. 0.85714286 1.
0.94642857 0.96428571 0.96428571 1. ]
mean value: 0.9697055137844611
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.96551724 1. 1. 0.87096774 1.
0.94915254 0.96551724 0.96428571 1. ]
mean value: 0.9715440481352701
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.93333333 1. 1. 0.79411765 1.
0.90322581 0.93333333 0.96428571 1. ]
mean value: 0.9528295834462818
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.96428571 1.
1. 1. 0.96428571 1. ]
mean value: 0.9928571428571429
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.96551724 1. 1. 0.85714286 1.
0.94642857 0.96428571 0.96428571 1. ]
mean value: 0.9697660098522167
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.93333333 1. 1. 0.77142857 1.
0.90322581 0.93333333 0.93103448 1. ]
mean value: 0.9472355527305472
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.96
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.12735271 0.1331346 0.13445926 0.12076545 0.1190021 0.1195786
0.11903858 0.12229443 0.12137938 0.11910081]
mean value: 0.12361059188842774
key: score_time
value: [0.0200398 0.02035165 0.02030778 0.01841545 0.01809835 0.01818275
0.01851726 0.01960111 0.01816249 0.01938963]
mean value: 0.019106626510620117
key: test_mcc
value: [0.96551724 0.96551724 0.96547546 1. 0.89342711 0.89342711
0.93094934 0.96490128 0.96490128 1. ]
mean value: 0.9544116062967756
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.98245614 0.98245614 1. 0.94642857 0.94642857
0.96428571 0.98214286 0.98214286 1. ]
mean value: 0.9768796992481202
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98245614 0.98245614 0.98305085 1. 0.94736842 0.94545455
0.96551724 0.98245614 0.98181818 1. ]
mean value: 0.9770577658214927
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96551724 0.96551724 0.96666667 1. 0.93103448 0.96296296
0.93333333 0.96551724 1. 1. ]
mean value: 0.9690549169859515
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.96428571 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.9857142857142858
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98275862 0.98275862 0.98214286 1. 0.94642857 0.94642857
0.96428571 0.98214286 0.98214286 1. ]
mean value: 0.976908866995074
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96551724 0.96551724 0.96666667 1. 0.9 0.89655172
0.93333333 0.96551724 0.96428571 1. ]
mean value: 0.9557389162561577
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.6
Accuracy on Blind test: 0.88
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01039386 0.01049614 0.01059151 0.01052785 0.01056314 0.01058316
0.0104866 0.01042509 0.01065969 0.01042533]
mean value: 0.010515236854553222
key: score_time
value: [0.00886655 0.00900602 0.00947428 0.00894213 0.00898314 0.00901628
0.00893044 0.00892806 0.0092063 0.0089128 ]
mean value: 0.009026598930358887
key: test_mcc
value: [0.77903565 0.86189955 0.74822828 0.74822828 0.79385662 0.78772636
0.6882472 0.89802651 0.82618439 0.8660254 ]
mean value: 0.7997458248025408
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.87719298 0.92982456 0.85964912 0.85964912 0.89285714 0.89285714
0.82142857 0.94642857 0.91071429 0.92857143]
mean value: 0.8919172932330827
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.93103448 0.87878788 0.87878788 0.9 0.89655172
0.84848485 0.94915254 0.91525424 0.93333333]
mean value: 0.9020275814840397
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.9 0.78378378 0.78378378 0.84375 0.86666667
0.73684211 0.90322581 0.87096774 0.875 ]
mean value: 0.8364019887884488
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96428571 1. 1. 0.96428571 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.9821428571428572
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.87931034 0.93041872 0.85714286 0.85714286 0.89285714 0.89285714
0.82142857 0.94642857 0.91071429 0.92857143]
mean value: 0.8916871921182267
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.87096774 0.78378378 0.78378378 0.81818182 0.8125
0.73684211 0.90322581 0.84375 0.875 ]
mean value: 0.822803503939964
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.53
Accuracy on Blind test: 0.86
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.72128773 1.77692485 1.74783516 1.83495116 1.73265433 1.9805038
1.75580668 1.82116747 1.85459757 1.74639463]
mean value: 1.797212338447571
key: score_time
value: [0.10111117 0.09629679 0.09658933 0.11028838 0.09789872 0.10091949
0.10473514 0.12026238 0.09599161 0.09527445]
mean value: 0.10193674564361573
key: test_mcc
value: [1. 0.96551724 1. 1. 0.89342711 0.93094934
0.93094934 0.93094934 0.92857143 1. ]
mean value: 0.9580363791069356
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.98245614 1. 1. 0.94642857 0.96428571
0.96428571 0.96428571 0.96428571 1. ]
mean value: 0.9786027568922305
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.98245614 1. 1. 0.94736842 0.96296296
0.96551724 0.96551724 0.96428571 1. ]
mean value: 0.9788107721410807
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96551724 1. 1. 0.93103448 1.
0.93333333 0.93333333 0.96428571 1. ]
mean value: 0.9727504105090312
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.96428571 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.9857142857142858
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.98275862 1. 1. 0.94642857 0.96428571
0.96428571 0.96428571 0.96428571 1. ]
mean value: 0.9786330049261084
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.96551724 1. 1. 0.9 0.92857143
0.93333333 0.93333333 0.93103448 1. ]
mean value: 0.9591789819376026
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.79
Accuracy on Blind test: 0.93
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.93988156 0.98185563 0.95095325 0.95623612 1.0021174 1.01571774
0.96705437 0.9789834 1.06533718 0.96144629]
mean value: 0.9819582939147949
key: score_time
value: [0.17660475 0.25053692 0.26670814 0.16711521 0.25986362 0.19897556
0.23333478 0.21554923 0.25029039 0.24634266]
mean value: 0.22653212547302246
key: test_mcc
value: [0.96551724 0.8951918 1. 1. 0.93094934 0.93094934
0.93094934 0.93094934 1. 1. ]
mean value: 0.958450638898162
key: train_mcc
value: [0.97660378 0.9685613 0.98046755 0.97275888 0.98437404 0.98050495
0.98437404 0.98437404 0.98050495 0.98050495]
mean value: 0.9793028481235929
key: test_accuracy
value: [0.98245614 0.94736842 1. 1. 0.96428571 0.96428571
0.96428571 0.96428571 1. 1. ]
mean value: 0.9786967418546366
key: train_accuracy
value: [0.98816568 0.98422091 0.99013807 0.98619329 0.99212598 0.99015748
0.99212598 0.99212598 0.99015748 0.99015748]
mean value: 0.9895568342418737
key: test_fscore
value: [0.98245614 0.94545455 1. 1. 0.96551724 0.96296296
0.96551724 0.96551724 1. 1. ]
mean value: 0.9787425372906317
key: train_fscore
value: [0.98832685 0.984375 0.99021526 0.98635478 0.9921875 0.99025341
0.9921875 0.9921875 0.99025341 0.99025341]
mean value: 0.9896594622183483
key: test_precision
value: [0.96551724 0.96296296 1. 1. 0.93333333 1.
0.93333333 0.93333333 1. 1. ]
mean value: 0.9728480204342274
key: train_precision
value: [0.97692308 0.97674419 0.98062016 0.97307692 0.98449612 0.98069498
0.98449612 0.98449612 0.98069498 0.98069498]
mean value: 0.9802937655263236
key: test_recall
value: [1. 0.92857143 1. 1. 1. 0.92857143
1. 1. 1. 1. ]
mean value: 0.9857142857142858
key: train_recall
value: [1. 0.99212598 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9992125984251968
key: test_roc_auc
value: [0.98275862 0.94704433 1. 1. 0.96428571 0.96428571
0.96428571 0.96428571 1. 1. ]
mean value: 0.9786945812807882
key: train_roc_auc
value: [0.98814229 0.98420528 0.99015748 0.98622047 0.99212598 0.99015748
0.99212598 0.99212598 0.99015748 0.99015748]
mean value: 0.9895575923562915
key: test_jcc
value: [0.96551724 0.89655172 1. 1. 0.93333333 0.92857143
0.93333333 0.93333333 1. 1. ]
mean value: 0.959064039408867
key: train_jcc
value: [0.97692308 0.96923077 0.98062016 0.97307692 0.98449612 0.98069498
0.98449612 0.98449612 0.98069498 0.98069498]
mean value: 0.9795424238447494
MCC on Blind test: 0.83
Accuracy on Blind test: 0.94
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02351713 0.01024127 0.01087022 0.01184297 0.01054907 0.01104808
0.01173329 0.01041317 0.0110364 0.01116943]
mean value: 0.01224210262298584
key: score_time
value: [0.0089438 0.00970793 0.00972652 0.00989699 0.00950623 0.00932384
0.0098536 0.00955272 0.00994015 0.00911736]
mean value: 0.009556913375854492
key: test_mcc
value: [0.80817326 0.57973205 0.43842365 0.43842365 0.57735027 0.64285714
0.58501794 0.64285714 0.47187011 0.53605627]
mean value: 0.5720761462070288
key: train_mcc
value: [0.59369456 0.64499463 0.64499463 0.63709364 0.62999938 0.56756289
0.64173726 0.60292787 0.63779528 0.59849942]
mean value: 0.6199299551759412
key: test_accuracy
value: [0.89473684 0.78947368 0.71929825 0.71929825 0.78571429 0.82142857
0.78571429 0.82142857 0.73214286 0.76785714]
mean value: 0.7837092731829574
key: train_accuracy
value: [0.79684418 0.82248521 0.82248521 0.81854043 0.81496063 0.78346457
0.82086614 0.8011811 0.81889764 0.7992126 ]
mean value: 0.8098937706751153
key: test_fscore
value: [0.90322581 0.77777778 0.72413793 0.72413793 0.76923077 0.82142857
0.80645161 0.82142857 0.70588235 0.76363636]
mean value: 0.7817337687867034
key: train_fscore
value: [0.79684418 0.82213439 0.82283465 0.81746032 0.81640625 0.77822581
0.82121807 0.79678068 0.81889764 0.79761905]
mean value: 0.8088421032567705
key: test_precision
value: [0.82352941 0.80769231 0.72413793 0.72413793 0.83333333 0.82142857
0.73529412 0.82142857 0.7826087 0.77777778]
mean value: 0.7851368648793466
key: train_precision
value: [0.79841897 0.82539683 0.81960784 0.82071713 0.81007752 0.79752066
0.81960784 0.81481481 0.81889764 0.804 ]
mean value: 0.8129059248624415
key: test_recall
value: [1. 0.75 0.72413793 0.72413793 0.71428571 0.82142857
0.89285714 0.82142857 0.64285714 0.75 ]
mean value: 0.7841133004926109
key: train_recall
value: [0.79527559 0.81889764 0.82608696 0.81422925 0.82283465 0.75984252
0.82283465 0.77952756 0.81889764 0.79133858]
mean value: 0.8049765024431235
key: test_roc_auc
value: [0.89655172 0.7887931 0.71921182 0.71921182 0.78571429 0.82142857
0.78571429 0.82142857 0.73214286 0.76785714]
mean value: 0.7838054187192118
key: train_roc_auc
value: [0.79684728 0.8224923 0.8224923 0.81853195 0.81496063 0.78346457
0.82086614 0.8011811 0.81889764 0.7992126 ]
mean value: 0.8098946500264542
key: test_jcc
value: [0.82352941 0.63636364 0.56756757 0.56756757 0.625 0.6969697
0.67567568 0.6969697 0.54545455 0.61764706]
mean value: 0.6452744857156621
key: train_jcc
value: [0.66229508 0.69798658 0.69899666 0.69127517 0.68976898 0.6369637
0.69666667 0.66220736 0.69333333 0.66336634]
mean value: 0.6792859850212573
MCC on Blind test: 0.2
Accuracy on Blind test: 0.69
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.08453417 0.06654429 0.07418919 0.08602238 0.06946802 0.08242369
0.06857514 0.07357121 0.07413006 0.07933426]
mean value: 0.07587924003601074
key: score_time
value: [0.01240754 0.01080799 0.01115561 0.01121211 0.01110983 0.01113629
0.01076126 0.01157236 0.01180458 0.0112443 ]
mean value: 0.011321187019348145
key: test_mcc
value: [1. 0.9321832 1. 0.96547546 0.89802651 1.
0.96490128 0.93094934 0.96490128 1. ]
mean value: 0.965643706501215
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.96491228 1. 0.98245614 0.94642857 1.
0.98214286 0.96428571 0.98214286 1. ]
mean value: 0.9822368421052632
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.96551724 1. 0.98305085 0.94915254 1.
0.98245614 0.96551724 0.98245614 1. ]
mean value: 0.9828150153290883
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.93333333 1. 0.96666667 0.90322581 1.
0.96551724 0.93333333 0.96551724 1. ]
mean value: 0.9667593622543567
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.96551724 1. 0.98214286 0.94642857 1.
0.98214286 0.96428571 0.98214286 1. ]
mean value: 0.9822660098522168
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.93333333 1. 0.96666667 0.90322581 1.
0.96551724 0.93333333 0.96551724 1. ]
mean value: 0.9667593622543567
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.88
Accuracy on Blind test: 0.96
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.05864573 0.08753681 0.07722974 0.0898056 0.0872829 0.07127666
0.08494568 0.06207895 0.05319452 0.06795883]
mean value: 0.07399554252624511
key: score_time
value: [0.01858926 0.01282024 0.02478695 0.01922059 0.01901817 0.01264048
0.01914215 0.01234865 0.01900911 0.01905823]
mean value: 0.01766338348388672
key: test_mcc
value: [0.86851042 0.82512315 0.8953202 0.86789789 0.82195294 0.96490128
0.82618439 0.82618439 0.96490128 0.96490128]
mean value: 0.8825877226392335
key: train_mcc
value: [0.96055211 0.97239383 0.95266254 0.95661511 0.96850394 0.95670033
0.96850394 0.97250878 0.96062992 0.96062992]
mean value: 0.9629700415209543
key: test_accuracy
value: [0.92982456 0.9122807 0.94736842 0.92982456 0.91071429 0.98214286
0.91071429 0.91071429 0.98214286 0.98214286]
mean value: 0.9397869674185464
key: train_accuracy
value: [0.98027613 0.98619329 0.97633136 0.97830375 0.98425197 0.97834646
0.98425197 0.98622047 0.98031496 0.98031496]
mean value: 0.9814805323890727
key: test_fscore
value: [0.93333333 0.9122807 0.94736842 0.93548387 0.90909091 0.98245614
0.91525424 0.91525424 0.98245614 0.98245614]
mean value: 0.9415434131827904
key: train_fscore
value: [0.98031496 0.98624754 0.97628458 0.97830375 0.98425197 0.978389
0.98425197 0.98613861 0.98031496 0.98031496]
mean value: 0.9814812307513464
key: test_precision
value: [0.875 0.89655172 0.96428571 0.87878788 0.92592593 0.96551724
0.87096774 0.87096774 0.96551724 0.96551724]
mean value: 0.9179038451146349
key: train_precision
value: [0.98031496 0.98431373 0.97628458 0.97637795 0.98425197 0.97647059
0.98425197 0.99203187 0.98031496 0.98031496]
mean value: 0.9814927542869231
key: test_recall
value: [1. 0.92857143 0.93103448 1. 0.89285714 1.
0.96428571 0.96428571 1. 1. ]
mean value: 0.968103448275862
key: train_recall
value: [0.98031496 0.98818898 0.97628458 0.98023715 0.98425197 0.98031496
0.98425197 0.98031496 0.98031496 0.98031496]
mean value: 0.9814789455665868
key: test_roc_auc
value: [0.93103448 0.91256158 0.9476601 0.92857143 0.91071429 0.98214286
0.91071429 0.91071429 0.98214286 0.98214286]
mean value: 0.9398399014778326
key: train_roc_auc
value: [0.98027606 0.98618935 0.97633127 0.97830755 0.98425197 0.97834646
0.98425197 0.98622047 0.98031496 0.98031496]
mean value: 0.9814805016961813
key: test_jcc
value: [0.875 0.83870968 0.9 0.87878788 0.83333333 0.96551724
0.84375 0.84375 0.96551724 0.96551724]
mean value: 0.8909882613678498
key: train_jcc
value: [0.96138996 0.97286822 0.95366795 0.95752896 0.96899225 0.95769231
0.96899225 0.97265625 0.96138996 0.96138996]
mean value: 0.9636568066237398
MCC on Blind test: 0.61
Accuracy on Blind test: 0.86
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01440048 0.01289797 0.00995111 0.0097754 0.00982618 0.01017928
0.00993538 0.00999284 0.01034856 0.00996208]
mean value: 0.0107269287109375
key: score_time
value: [0.01222897 0.00954247 0.00876284 0.0087254 0.00896811 0.00879383
0.00874734 0.00874972 0.00899076 0.00880122]
mean value: 0.009231066703796387
key: test_mcc
value: [0.64889453 0.64901478 0.6166424 0.58076493 0.50128041 0.61065803
0.52174919 0.50128041 0.53881591 0.67900461]
mean value: 0.5848105187193532
key: train_mcc
value: [0.61406315 0.6462136 0.65745192 0.5827872 0.67819632 0.57508846
0.6265721 0.67097829 0.68811802 0.54745203]
mean value: 0.6286921092607803
key: test_accuracy
value: [0.80701754 0.8245614 0.80701754 0.78947368 0.75 0.80357143
0.75 0.75 0.76785714 0.83928571]
mean value: 0.7888784461152882
key: train_accuracy
value: [0.80670611 0.82248521 0.82840237 0.79092702 0.83858268 0.78740157
0.81299213 0.83464567 0.84251969 0.77362205]
mean value: 0.813828448958673
key: test_fscore
value: [0.83076923 0.82142857 0.81967213 0.78571429 0.74074074 0.81355932
0.78125 0.75862069 0.75471698 0.84210526]
mean value: 0.7948577215779411
key: train_fscore
value: [0.81153846 0.82824427 0.83172147 0.79615385 0.84291188 0.79069767
0.81695568 0.84030418 0.84962406 0.77669903]
mean value: 0.8184850560127853
key: test_precision
value: [0.72972973 0.82142857 0.78125 0.81481481 0.76923077 0.77419355
0.69444444 0.73333333 0.8 0.82758621]
mean value: 0.7746011418265312
key: train_precision
value: [0.79323308 0.8037037 0.81439394 0.7752809 0.82089552 0.77862595
0.8 0.8125 0.81294964 0.76628352]
mean value: 0.7977866266459333
key: test_recall
value: [0.96428571 0.82142857 0.86206897 0.75862069 0.71428571 0.85714286
0.89285714 0.78571429 0.71428571 0.85714286]
mean value: 0.8227832512315271
key: train_recall
value: [0.83070866 0.85433071 0.84980237 0.81818182 0.86614173 0.80314961
0.83464567 0.87007874 0.88976378 0.78740157]
mean value: 0.8404204662164265
key: test_roc_auc
value: [0.80972906 0.82450739 0.80603448 0.79002463 0.75 0.80357143
0.75 0.75 0.76785714 0.83928571]
mean value: 0.7891009852216748
key: train_roc_auc
value: [0.80665868 0.82242227 0.82844449 0.79098067 0.83858268 0.78740157
0.81299213 0.83464567 0.84251969 0.77362205]
mean value: 0.8138269895116865
key: test_jcc
value: [0.71052632 0.6969697 0.69444444 0.64705882 0.58823529 0.68571429
0.64102564 0.61111111 0.60606061 0.72727273]
mean value: 0.6608418946035045
key: train_jcc
value: [0.6828479 0.70684039 0.71192053 0.66134185 0.72847682 0.65384615
0.69055375 0.72459016 0.73856209 0.63492063]
mean value: 0.6933900281480951
MCC on Blind test: 0.59
Accuracy on Blind test: 0.83
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01517582 0.02547216 0.03090811 0.02699566 0.02723742 0.02942634
0.02590513 0.02685237 0.03164697 0.02517891]
mean value: 0.026479887962341308
key: score_time
value: [0.01010776 0.01122403 0.01200104 0.01206923 0.01203632 0.01194191
0.01193762 0.01199579 0.01200461 0.01850152]
mean value: 0.012381982803344727
key: test_mcc
value: [0.96551724 0.82512315 0.89988258 0.96551724 0.78772636 0.96490128
0.79385662 0.89802651 0.89342711 0.92857143]
mean value: 0.8922549527400198
key: train_mcc
value: [0.90714511 0.97239383 0.942062 0.94550473 0.95687833 0.93843444
0.95687833 0.98050495 0.97649905 0.96463421]
mean value: 0.954093498873719
key: test_accuracy
value: [0.98245614 0.9122807 0.94736842 0.98245614 0.89285714 0.98214286
0.89285714 0.94642857 0.94642857 0.96428571]
mean value: 0.9449561403508772
key: train_accuracy
value: [0.95266272 0.98619329 0.9704142 0.97238659 0.97834646 0.96850394
0.97834646 0.99015748 0.98818898 0.98228346]
mean value: 0.9767483576387271
key: test_fscore
value: [0.98245614 0.9122807 0.94545455 0.98245614 0.89655172 0.98245614
0.9 0.94915254 0.94736842 0.96428571]
mean value: 0.9462462070110721
key: train_fscore
value: [0.95121951 0.98624754 0.96957404 0.97177419 0.9785575 0.96934866
0.9785575 0.99025341 0.98828125 0.98217822]
mean value: 0.9765991834337232
key: test_precision
value: [0.96551724 0.89655172 1. 1. 0.86666667 0.96551724
0.84375 0.90322581 0.93103448 0.96428571]
mean value: 0.9336548877059166
key: train_precision
value: [0.98319328 0.98431373 0.99583333 0.99176955 0.96911197 0.94402985
0.96911197 0.98069498 0.98062016 0.98804781]
mean value: 0.9786726616928444
key: test_recall
value: [1. 0.92857143 0.89655172 0.96551724 0.92857143 1.
0.96428571 1. 0.96428571 0.96428571]
mean value: 0.9612068965517242
key: train_recall
value: [0.92125984 0.98818898 0.94466403 0.95256917 0.98818898 0.99606299
0.98818898 1. 0.99606299 0.97637795]
mean value: 0.9751563910242446
key: test_roc_auc
value: [0.98275862 0.91256158 0.94827586 0.98275862 0.89285714 0.98214286
0.89285714 0.94642857 0.94642857 0.96428571]
mean value: 0.9451354679802956
key: train_roc_auc
value: [0.95272478 0.98618935 0.97036351 0.97234758 0.97834646 0.96850394
0.97834646 0.99015748 0.98818898 0.98228346]
mean value: 0.9767451993402011
key: test_jcc
value: [0.96551724 0.83870968 0.89655172 0.96551724 0.8125 0.96551724
0.81818182 0.90322581 0.9 0.93103448]
mean value: 0.8996755233087269
key: train_jcc
value: [0.90697674 0.97286822 0.94094488 0.94509804 0.95801527 0.94052045
0.95801527 0.98069498 0.97683398 0.96498054]
mean value: 0.9544948365069599
MCC on Blind test: 0.6
Accuracy on Blind test: 0.82
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02158213 0.01965857 0.02065611 0.01798463 0.01837373 0.01744699
0.01999712 0.0184269 0.02369738 0.02024031]
mean value: 0.019806385040283203
key: score_time
value: [0.01195168 0.01195455 0.01196837 0.01192522 0.01188898 0.011935
0.01193929 0.01191378 0.01197457 0.01197529]
mean value: 0.011942672729492187
key: test_mcc
value: [0.9321832 0.64058163 0.89952865 1. 0.75434227 0.89802651
0.74535599 0.8660254 0.75047877 0.93094934]
mean value: 0.841747176363628
key: train_mcc
value: [0.94524716 0.82552467 0.88136732 0.90342654 0.89677099 0.92228969
0.92779624 0.92985478 0.98038334 0.89990029]
mean value: 0.9112561021351867
key: test_accuracy
value: [0.96491228 0.78947368 0.94736842 1. 0.875 0.94642857
0.85714286 0.92857143 0.875 0.96428571]
mean value: 0.9148182957393484
key: train_accuracy
value: [0.97238659 0.90532544 0.93885602 0.95069034 0.94685039 0.96062992
0.96259843 0.96456693 0.99015748 0.9488189 ]
mean value: 0.9540880429887092
key: test_fscore
value: [0.96551724 0.82352941 0.95081967 1. 0.88135593 0.94339623
0.875 0.93333333 0.87272727 0.96551724]
mean value: 0.9211196331333564
key: train_fscore
value: [0.972 0.91366906 0.94139887 0.95219885 0.9489603 0.95967742
0.96394687 0.96525097 0.99021526 0.95057034]
mean value: 0.9557887945831837
key: test_precision
value: [0.93333333 0.7 0.90625 1. 0.83870968 1.
0.77777778 0.875 0.88888889 0.93333333]
mean value: 0.8853293010752689
key: train_precision
value: [0.98780488 0.8410596 0.90217391 0.92222222 0.91272727 0.98347107
0.93040293 0.9469697 0.9844358 0.91911765]
mean value: 0.9330385035167746
key: test_recall
value: [1. 1. 1. 1. 0.92857143 0.89285714
1. 1. 0.85714286 1. ]
mean value: 0.9678571428571429
key: train_recall
value: [0.95669291 1. 0.98418972 0.98418972 0.98818898 0.93700787
1. 0.98425197 0.99606299 0.98425197]
mean value: 0.9814836139553702
key: test_roc_auc
value: [0.96551724 0.79310345 0.94642857 1. 0.875 0.94642857
0.85714286 0.92857143 0.875 0.96428571]
mean value: 0.9151477832512316
key: train_roc_auc
value: [0.9724176 0.90513834 0.93894526 0.95075628 0.94685039 0.96062992
0.96259843 0.96456693 0.99015748 0.9488189 ]
mean value: 0.9540879524446796
key: test_jcc
value: [0.93333333 0.7 0.90625 1. 0.78787879 0.89285714
0.77777778 0.875 0.77419355 0.93333333]
mean value: 0.8580623923567472
key: train_jcc
value: [0.94552529 0.8410596 0.88928571 0.90875912 0.9028777 0.92248062
0.93040293 0.93283582 0.98062016 0.9057971 ]
mean value: 0.9159644058634359
MCC on Blind test: 0.73
Accuracy on Blind test: 0.9
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.18949628 0.16845989 0.17046356 0.17315173 0.18008161 0.17156911
0.17026114 0.16903186 0.18008924 0.17638707]
mean value: 0.17489914894104003
key: score_time
value: [0.01661611 0.01543522 0.01551104 0.01576376 0.01544642 0.01590562
0.01548719 0.01546669 0.01654816 0.01654792]
mean value: 0.015872812271118163
key: test_mcc
value: [1. 0.96551724 1. 0.96547546 0.83484711 1.
0.96490128 0.93094934 0.96490128 1. ]
mean value: 0.962659170679551
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.98245614 1. 0.98245614 0.91071429 1.
0.98214286 0.96428571 0.98214286 1. ]
mean value: 0.9804197994987468
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.98245614 1. 0.98305085 0.91803279 1.
0.98245614 0.96551724 0.98245614 1. ]
mean value: 0.9813969296774815
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96551724 1. 0.96666667 0.84848485 1.
0.96551724 0.93333333 0.96551724 1. ]
mean value: 0.9645036572622779
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.98275862 1. 0.98214286 0.91071429 1.
0.98214286 0.96428571 0.98214286 1. ]
mean value: 0.9804187192118227
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.96551724 1. 0.96666667 0.84848485 1.
0.96551724 0.93333333 0.96551724 1. ]
mean value: 0.9645036572622779
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.88
Accuracy on Blind test: 0.96
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.05747509 0.07012534 0.08541584 0.09682059 0.08754301 0.08933735
0.07895207 0.0594759 0.07882285 0.08167696]
mean value: 0.07856450080871583
key: score_time
value: [0.01824808 0.03937316 0.03136182 0.03505492 0.03816462 0.03392482
0.03387403 0.02909517 0.02104235 0.0223937 ]
mean value: 0.030253267288208006
key: test_mcc
value: [1. 0.8951918 1. 0.96547546 0.83484711 1.
0.93094934 0.93094934 0.96490128 1. ]
mean value: 0.9522314322910705
key: train_mcc
value: [1. 0.99211042 0.99214142 0.99606299 1. 1.
0.99607071 0.99607071 0.99607071 0.99215674]
mean value: 0.9960683715267692
key: test_accuracy
value: [1. 0.94736842 1. 0.98245614 0.91071429 1.
0.96428571 0.96428571 0.98214286 1. ]
mean value: 0.9751253132832081
key: train_accuracy
value: [1. 0.99605523 0.99605523 0.99802761 1. 1.
0.9980315 0.9980315 0.9980315 0.99606299]
mean value: 0.9980295547376105
key: test_fscore
value: [1. 0.94545455 1. 0.98305085 0.91803279 1.
0.96551724 0.96551724 0.98245614 1. ]
mean value: 0.9760028802906916
key: train_fscore
value: [1. 0.99606299 0.99606299 0.99802761 1. 1.
0.99803536 0.99803536 0.99803536 0.99607843]
mean value: 0.9980338119410027
key: test_precision
value: [1. 0.96296296 1. 0.96666667 0.84848485 1.
0.93333333 0.93333333 0.96551724 1. ]
mean value: 0.9610298386160455
key: train_precision
value: [1. 0.99606299 0.99215686 0.99606299 1. 1.
0.99607843 0.99607843 0.99607843 0.9921875 ]
mean value: 0.9964705641114714
key: test_recall
value: [1. 0.92857143 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9928571428571429
key: train_recall
value: [1. 0.99606299 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9996062992125985
key: test_roc_auc
value: [1. 0.94704433 1. 0.98214286 0.91071429 1.
0.96428571 0.96428571 0.98214286 1. ]
mean value: 0.9750615763546798
key: train_roc_auc
value: [1. 0.99605521 0.99606299 0.9980315 1. 1.
0.9980315 0.9980315 0.9980315 0.99606299]
mean value: 0.9980307179981949
key: test_jcc
value: [1. 0.89655172 1. 0.96666667 0.84848485 1.
0.93333333 0.93333333 0.96551724 1. ]
mean value: 0.9543887147335424
key: train_jcc
value: [1. 0.99215686 0.99215686 0.99606299 1. 1.
0.99607843 0.99607843 0.99607843 0.9921875 ]
mean value: 0.9960799511733828
MCC on Blind test: 0.87
Accuracy on Blind test: 0.96
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.17560363 0.20586491 0.16315055 0.14851332 0.21609211 0.3052907
0.25830936 0.24186087 0.25000715 0.23897648]
mean value: 0.220366907119751
key: score_time
value: [0.02612948 0.02948737 0.01560521 0.02712798 0.03579521 0.02561378
0.02915239 0.02643967 0.0419426 0.02583075]
mean value: 0.02831244468688965
key: test_mcc
value: [0.83797038 0.82942474 0.77728159 0.96547546 0.76225171 0.82195294
0.80439967 0.93094934 0.89342711 0.96490128]
mean value: 0.8588034211725772
key: train_mcc
value: [0.98434291 0.98434291 0.98823511 0.98823511 0.98437404 0.99607071
0.98437404 0.98437404 0.98825791 0.99215674]
mean value: 0.9874763527102344
key: test_accuracy
value: [0.9122807 0.9122807 0.87719298 0.98245614 0.875 0.91071429
0.89285714 0.96428571 0.94642857 0.98214286]
mean value: 0.925563909774436
key: train_accuracy
value: [0.99211045 0.99211045 0.99408284 0.99408284 0.99212598 0.9980315
0.99212598 0.99212598 0.99409449 0.99606299]
mean value: 0.9936953516905062
key: test_fscore
value: [0.91803279 0.91525424 0.89230769 0.98305085 0.8852459 0.9122807
0.90322581 0.96551724 0.94736842 0.98245614]
mean value: 0.9304739776566863
key: train_fscore
value: [0.9921875 0.9921875 0.99410609 0.99410609 0.9921875 0.99803536
0.9921875 0.9921875 0.99412916 0.99607843]
mean value: 0.9937392634089591
key: test_precision
value: [0.84848485 0.87096774 0.80555556 0.96666667 0.81818182 0.89655172
0.82352941 0.93333333 0.93103448 0.96551724]
mean value: 0.8859822824198275
key: train_precision
value: [0.98449612 0.98449612 0.98828125 0.98828125 0.98449612 0.99607843
0.98449612 0.98449612 0.98832685 0.9921875 ]
mean value: 0.9875635899776615
key: test_recall
value: [1. 0.96428571 1. 1. 0.96428571 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.9821428571428572
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9137931 0.91317734 0.875 0.98214286 0.875 0.91071429
0.89285714 0.96428571 0.94642857 0.98214286]
mean value: 0.9255541871921182
key: train_roc_auc
value: [0.99209486 0.99209486 0.99409449 0.99409449 0.99212598 0.9980315
0.99212598 0.99212598 0.99409449 0.99606299]
mean value: 0.9936945628831969
key: test_jcc
value: [0.84848485 0.84375 0.80555556 0.96666667 0.79411765 0.83870968
0.82352941 0.93333333 0.9 0.96551724]
mean value: 0.8719664381662598
key: train_jcc
value: [0.98449612 0.98449612 0.98828125 0.98828125 0.98449612 0.99607843
0.98449612 0.98449612 0.98832685 0.9921875 ]
mean value: 0.9875635899776615
MCC on Blind test: 0.38
Accuracy on Blind test: 0.8
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.67622995 0.6558485 0.66100764 0.65075469 0.66132975 0.65230036
0.65977788 0.65869689 0.66375089 0.65631771]
mean value: 0.6596014261245727
key: score_time
value: [0.00952291 0.00961614 0.0093987 0.00940108 0.00963163 0.00945449
0.00950718 0.00944567 0.00945544 0.0094955 ]
mean value: 0.009492874145507812
key: test_mcc
value: [1. 0.96551724 1. 1. 0.89802651 1.
0.93094934 0.93094934 0.96490128 1. ]
mean value: 0.9690343705369726
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.98245614 1. 1. 0.94642857 1.
0.96428571 0.96428571 0.98214286 1. ]
mean value: 0.9839598997493735
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.98245614 1. 1. 0.94915254 1.
0.96551724 0.96551724 0.98245614 1. ]
mean value: 0.9845099305833256
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96551724 1. 1. 0.90322581 1.
0.93333333 0.93333333 0.96551724 1. ]
mean value: 0.9700926955876901
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.98275862 1. 1. 0.94642857 1.
0.96428571 0.96428571 0.98214286 1. ]
mean value: 0.9839901477832512
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.96551724 1. 1. 0.90322581 1.
0.93333333 0.93333333 0.96551724 1. ]
mean value: 0.9700926955876901
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.96
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03066301 0.03192091 0.03159785 0.03225589 0.03112817 0.0315671
0.05233073 0.07486224 0.05127001 0.05236316]
mean value: 0.041995906829833986
key: score_time
value: [0.01229358 0.01268935 0.01368308 0.01742101 0.01394725 0.01394153
0.01992059 0.0179832 0.01861358 0.02085924]
mean value: 0.016135239601135255
key: test_mcc
value: [1. 0.96547546 0.96547546 1. 0.96490128 0.89342711
1. 1. 0.96490128 1. ]
mean value: 0.9754180588113227
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.98245614 0.98245614 1. 0.98214286 0.94642857
1. 1. 0.98214286 1. ]
mean value: 0.987562656641604
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.98181818 0.98305085 1. 0.98181818 0.94545455
1. 1. 0.98181818 1. ]
mean value: 0.9873959938366718
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.96666667 1. 1. 0.96296296
1. 1. 1. 1. ]
mean value: 0.9929629629629629
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96428571 1. 1. 0.96428571 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.9821428571428572
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.98214286 0.98214286 1. 0.98214286 0.94642857
1. 1. 0.98214286 1. ]
mean value: 0.9875
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.96428571 0.96666667 1. 0.96428571 0.89655172
1. 1. 0.96428571 1. ]
mean value: 0.9756075533661741
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.05
Accuracy on Blind test: 0.78
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.03958082 0.03368187 0.04933643 0.03995538 0.03961086 0.03955531
0.04075122 0.02445912 0.06261611 0.03506374]
mean value: 0.04046108722686768
key: score_time
value: [0.03454447 0.01914907 0.01997089 0.0189991 0.01908541 0.01905084
0.02588892 0.02134299 0.02253413 0.02127409]
mean value: 0.022183990478515624
key: test_mcc
value: [0.9321832 0.8615634 0.8953202 1. 0.82195294 1.
0.79385662 0.85933785 0.82195294 0.93094934]
mean value: 0.8917116490649181
key: train_mcc
value: [0.95269145 0.9605814 0.94872553 0.96055211 0.96853396 0.95670033
0.95675965 0.96853396 0.95670033 0.95670033]
mean value: 0.9586479053242231
key: test_accuracy
value: [0.96491228 0.92982456 0.94736842 1. 0.91071429 1.
0.89285714 0.92857143 0.91071429 0.96428571]
mean value: 0.9449248120300752
key: train_accuracy
value: [0.97633136 0.98027613 0.97435897 0.98027613 0.98425197 0.97834646
0.97834646 0.98425197 0.97834646 0.97834646]
mean value: 0.9793132367329823
key: test_fscore
value: [0.96551724 0.92592593 0.94736842 1. 0.9122807 1.
0.9 0.93103448 0.90909091 0.96551724]
mean value: 0.9456734923341094
key: train_fscore
value: [0.97647059 0.98039216 0.97435897 0.98023715 0.98431373 0.978389
0.97847358 0.98418972 0.97830375 0.978389 ]
mean value: 0.9793517647236116
key: test_precision
value: [0.93333333 0.96153846 0.96428571 1. 0.89655172 1.
0.84375 0.9 0.92592593 0.93333333]
mean value: 0.93587184925547
key: train_precision
value: [0.97265625 0.9765625 0.97244094 0.98023715 0.98046875 0.97647059
0.97276265 0.98809524 0.98023715 0.97647059]
mean value: 0.9776401813662509
key: test_recall
value: [1. 0.89285714 0.93103448 1. 0.92857143 1.
0.96428571 0.96428571 0.89285714 1. ]
mean value: 0.9573891625615764
key: train_recall
value: [0.98031496 0.98425197 0.97628458 0.98023715 0.98818898 0.98031496
0.98425197 0.98031496 0.97637795 0.98031496]
mean value: 0.9810852447791852
key: test_roc_auc
value: [0.96551724 0.92918719 0.9476601 1. 0.91071429 1.
0.89285714 0.92857143 0.91071429 0.96428571]
mean value: 0.9449507389162561
key: train_roc_auc
value: [0.97632349 0.98026828 0.97436276 0.98027606 0.98425197 0.97834646
0.97834646 0.98425197 0.97834646 0.97834646]
mean value: 0.9793120351062837
key: test_jcc
value: [0.93333333 0.86206897 0.9 1. 0.83870968 1.
0.81818182 0.87096774 0.83333333 0.93333333]
mean value: 0.8989928203053899
key: train_jcc
value: [0.95402299 0.96153846 0.95 0.96124031 0.96911197 0.95769231
0.95785441 0.9688716 0.95752896 0.95769231]
mean value: 0.9595553303608277
MCC on Blind test: 0.76
Accuracy on Blind test: 0.91
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.30773807 0.29322624 0.32750988 0.32462645 0.31639314 0.32397628
0.31056094 0.29872632 0.36270189 0.40175366]
mean value: 0.3267212867736816
key: score_time
value: [0.01916409 0.01916027 0.01916218 0.01910424 0.01910162 0.01980019
0.01962781 0.02352262 0.0193069 0.01963854]
mean value: 0.019758844375610353
key: test_mcc
value: [0.9321832 0.8615634 0.8953202 1. 0.75047877 1.
0.79385662 0.85933785 0.82195294 0.93094934]
mean value: 0.8845642321659994
key: train_mcc
value: [0.95269145 0.9605814 0.94872553 0.96055211 0.9645744 0.95670033
0.95675965 0.96853396 0.95670033 0.95670033]
mean value: 0.9582519495785499
key: test_accuracy
value: [0.96491228 0.92982456 0.94736842 1. 0.875 1.
0.89285714 0.92857143 0.91071429 0.96428571]
mean value: 0.9413533834586466
key: train_accuracy
value: [0.97633136 0.98027613 0.97435897 0.98027613 0.98228346 0.97834646
0.97834646 0.98425197 0.97834646 0.97834646]
mean value: 0.9791163863392816
key: test_fscore
value: [0.96551724 0.92592593 0.94736842 1. 0.87719298 1.
0.9 0.93103448 0.90909091 0.96551724]
mean value: 0.9421647204042848
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:148: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:151: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.97647059 0.98039216 0.97435897 0.98023715 0.98231827 0.978389
0.97847358 0.98418972 0.97830375 0.978389 ]
mean value: 0.9791522192865763
key: test_precision
value: [0.93333333 0.96153846 0.96428571 1. 0.86206897 1.
0.84375 0.9 0.92592593 0.93333333]
mean value: 0.932423573393401
key: train_precision
value: [0.97265625 0.9765625 0.97244094 0.98023715 0.98039216 0.97647059
0.97276265 0.98809524 0.98023715 0.97647059]
mean value: 0.9776325220525254
key: test_recall
value: [1. 0.89285714 0.93103448 1. 0.89285714 1.
0.96428571 0.96428571 0.89285714 1. ]
mean value: 0.9538177339901478
key: train_recall
value: [0.98031496 0.98425197 0.97628458 0.98023715 0.98425197 0.98031496
0.98425197 0.98031496 0.97637795 0.98031496]
mean value: 0.9806915439917837
key: test_roc_auc
value: [0.96551724 0.92918719 0.9476601 1. 0.875 1.
0.89285714 0.92857143 0.91071429 0.96428571]
mean value: 0.9413793103448276
key: train_roc_auc
value: [0.97632349 0.98026828 0.97436276 0.98027606 0.98228346 0.97834646
0.97834646 0.98425197 0.97834646 0.97834646]
mean value: 0.979115184712583
key: test_jcc
value: [0.93333333 0.86206897 0.9 1. 0.78125 1.
0.81818182 0.87096774 0.83333333 0.93333333]
mean value: 0.8932468525634544
key: train_jcc
value: [0.95402299 0.96153846 0.95 0.96124031 0.96525097 0.95769231
0.95785441 0.9688716 0.95752896 0.95769231]
mean value: 0.9591692299747274
MCC on Blind test: 0.76
Accuracy on Blind test: 0.91
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02975988 0.03407001 0.0321157 0.04390645 0.05316114 0.07947993
0.02710176 0.03776312 0.04052353 0.02836466]
mean value: 0.04062461853027344
key: score_time
value: [0.01219678 0.01215196 0.01284242 0.01259923 0.01233125 0.01213431
0.01271582 0.01214218 0.01203442 0.0119319 ]
mean value: 0.012308025360107422
key: test_mcc
value: [0.62994079 1. 0.6000992 0.66143783 0.87287156 0.87287156
0.32732684 0.66143783 0.32732684 0.53452248]
mean value: 0.6487834918450751
key: train_mcc
value: [0.8979331 0.88273483 0.89791134 0.86948194 0.8687127 0.8687127
0.91277477 0.91240409 0.8687127 0.91277477]
mean value: 0.8892152945204359
key: test_accuracy
value: [0.8125 1. 0.8 0.8 0.93333333 0.93333333
0.66666667 0.8 0.66666667 0.73333333]
mean value: 0.8145833333333333
key: train_accuracy
value: [0.94852941 0.94117647 0.94890511 0.93430657 0.93430657 0.93430657
0.95620438 0.95620438 0.93430657 0.95620438]
mean value: 0.9444450407900387
key: test_fscore
value: [0.82352941 1. 0.76923077 0.82352941 0.92307692 0.92307692
0.70588235 0.76923077 0.70588235 0.8 ]
mean value: 0.8243438914027149
key: train_fscore
value: [0.94736842 0.94029851 0.94890511 0.93333333 0.93430657 0.93430657
0.95522388 0.95588235 0.93430657 0.95522388]
mean value: 0.9439155193502107
key: test_precision
value: [0.77777778 1. 0.83333333 0.7 1. 1.
0.66666667 1. 0.66666667 0.66666667]
mean value: 0.8311111111111111
key: train_precision
value: [0.96923077 0.95454545 0.95588235 0.95454545 0.94117647 0.94117647
0.96969697 0.95588235 0.92753623 0.96969697]
mean value: 0.95393694966585
key: test_recall
value: [0.875 1. 0.71428571 1. 0.85714286 0.85714286
0.75 0.625 0.75 1. ]
mean value: 0.8428571428571429
key: train_recall
value: [0.92647059 0.92647059 0.94202899 0.91304348 0.92753623 0.92753623
0.94117647 0.95588235 0.94117647 0.94117647]
mean value: 0.9342497868712702
key: test_roc_auc
value: [0.8125 1. 0.79464286 0.8125 0.92857143 0.92857143
0.66071429 0.8125 0.66071429 0.71428571]
mean value: 0.8125
key: train_roc_auc
value: [0.94852941 0.94117647 0.94895567 0.93446292 0.93435635 0.93435635
0.95609548 0.95620205 0.93435635 0.95609548]
mean value: 0.944458653026428
key: test_jcc
value: [0.7 1. 0.625 0.7 0.85714286 0.85714286
0.54545455 0.625 0.54545455 0.66666667]
mean value: 0.7121861471861471
key: train_jcc
value: [0.9 0.88732394 0.90277778 0.875 0.87671233 0.87671233
0.91428571 0.91549296 0.87671233 0.91428571]
mean value: 0.8939303094059027
MCC on Blind test: 0.67
Accuracy on Blind test: 0.87
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.8173337 1.18391466 0.87638569 1.19149208 0.98477817 0.98145533
1.09080052 0.8652916 0.64874887 0.81251073]
mean value: 0.9452711343765259
key: score_time
value: [0.01361656 0.01263309 0.01250887 0.0123415 0.0226686 0.01354671
0.01257586 0.01254773 0.01364255 0.01389027]
mean value: 0.013997173309326172
key: test_mcc
value: [0.62994079 1. 0.6000992 0.76376262 0.87287156 0.73214286
0.32732684 0.66143783 0.73214286 0.53452248]
mean value: 0.6854247024498333
key: train_mcc
value: [0.94158382 0.94117647 0.95630861 0.97122151 0.91240409 1.
0.8251228 0.95629932 1. 0.98550418]
mean value: 0.9489620795067616
key: test_accuracy
value: [0.8125 1. 0.8 0.86666667 0.93333333 0.86666667
0.66666667 0.8 0.86666667 0.73333333]
mean value: 0.8345833333333333
key: train_accuracy
value: [0.97058824 0.97058824 0.97810219 0.98540146 0.95620438 1.
0.91240876 0.97810219 1. 0.99270073]
mean value: 0.9744096178617433
key: test_fscore
value: [0.82352941 1. 0.76923077 0.875 0.92307692 0.85714286
0.70588235 0.76923077 0.875 0.8 ]
mean value: 0.8398093083387201
key: train_fscore
value: [0.97014925 0.97058824 0.97810219 0.98529412 0.95652174 1.
0.91044776 0.97777778 1. 0.99259259]
mean value: 0.9741473667148377
key: test_precision
value: [0.77777778 1. 0.83333333 0.77777778 1. 0.85714286
0.66666667 1. 0.875 0.66666667]
mean value: 0.8454365079365079
key: train_precision
value: [0.98484848 0.97058824 0.98529412 1. 0.95652174 1.
0.92424242 0.98507463 1. 1. ]
mean value: 0.9806569628028192
key: test_recall
value: [0.875 1. 0.71428571 1. 0.85714286 0.85714286
0.75 0.625 0.875 1. ]
mean value: 0.8553571428571428
key: train_recall
value: [0.95588235 0.97058824 0.97101449 0.97101449 0.95652174 1.
0.89705882 0.97058824 1. 0.98529412]
mean value: 0.9677962489343563
key: test_roc_auc
value: [0.8125 1. 0.79464286 0.875 0.92857143 0.86607143
0.66071429 0.8125 0.86607143 0.71428571]
mean value: 0.8330357142857143
key: train_roc_auc
value: [0.97058824 0.97058824 0.97815431 0.98550725 0.95620205 1.
0.91229753 0.97804774 1. 0.99264706]
mean value: 0.9744032395566923
key: test_jcc
value: [0.7 1. 0.625 0.77777778 0.85714286 0.75
0.54545455 0.625 0.77777778 0.66666667]
mean value: 0.7324819624819625
key: train_jcc
value: [0.94202899 0.94285714 0.95714286 0.97101449 0.91666667 1.
0.83561644 0.95652174 1. 0.98529412]
mean value: 0.9507142440061194
MCC on Blind test: 0.73
Accuracy on Blind test: 0.9
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01516128 0.01172233 0.00925016 0.00884438 0.00867152 0.00968885
0.00902534 0.0096035 0.00953603 0.00910902]
mean value: 0.010061240196228028
key: score_time
value: [0.01382613 0.00929689 0.00917935 0.00864553 0.0086813 0.00937629
0.00872087 0.00905013 0.00939441 0.00907898]
mean value: 0.009524989128112792
key: test_mcc
value: [0.40451992 0.67419986 0.34247476 0.47245559 0.41931393 0.34247476
0.09449112 0.49099025 0.33928571 0.21821789]
mean value: 0.3798423801118058
key: train_mcc
value: [0.57294631 0.54111596 0.60664573 0.55776902 0.59674775 0.57705251
0.64235303 0.72621377 0.73836136 0.5626648 ]
mean value: 0.6121870240894057
key: test_accuracy
value: [0.6875 0.8125 0.66666667 0.73333333 0.66666667 0.66666667
0.53333333 0.73333333 0.66666667 0.6 ]
mean value: 0.6766666666666666
key: train_accuracy
value: [0.76470588 0.75 0.78832117 0.75912409 0.77372263 0.76642336
0.81021898 0.86131387 0.86861314 0.75912409]
mean value: 0.7901567196221554
key: test_fscore
value: [0.61538462 0.76923077 0.54545455 0.66666667 0.44444444 0.54545455
0.46153846 0.71428571 0.66666667 0.57142857]
mean value: 0.6000555000555
key: train_fscore
value: [0.70909091 0.69090909 0.75213675 0.7079646 0.72072072 0.71428571
0.77966102 0.85271318 0.86363636 0.69724771]
mean value: 0.7488366054215206
key: test_precision
value: [0.8 1. 0.75 0.8 1. 0.75
0.6 0.83333333 0.71428571 0.66666667]
mean value: 0.7914285714285715
key: train_precision
value: [0.92857143 0.9047619 0.91666667 0.90909091 0.95238095 0.93023256
0.92 0.90163934 0.890625 0.92682927]
mean value: 0.9180798032166374
key: test_recall
value: [0.5 0.625 0.42857143 0.57142857 0.28571429 0.42857143
0.375 0.625 0.625 0.5 ]
mean value: 0.49642857142857144
key: train_recall
value: [0.57352941 0.55882353 0.63768116 0.57971014 0.57971014 0.57971014
0.67647059 0.80882353 0.83823529 0.55882353]
mean value: 0.639151747655584
key: test_roc_auc
value: [0.6875 0.8125 0.65178571 0.72321429 0.64285714 0.65178571
0.54464286 0.74107143 0.66964286 0.60714286]
mean value: 0.6732142857142858
key: train_roc_auc
value: [0.76470588 0.75 0.78942882 0.76044331 0.77514919 0.76779625
0.80924979 0.8609335 0.86839301 0.75767263]
mean value: 0.7903772378516624
key: test_jcc
value: [0.44444444 0.625 0.375 0.5 0.28571429 0.375
0.3 0.55555556 0.5 0.4 ]
mean value: 0.43607142857142855
key: train_jcc
value: [0.54929577 0.52777778 0.60273973 0.54794521 0.56338028 0.55555556
0.63888889 0.74324324 0.76 0.53521127]
mean value: 0.6024037720915977
MCC on Blind test: 0.28
Accuracy on Blind test: 0.78
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00991607 0.00885415 0.0101788 0.01003098 0.00886106 0.00952673
0.00979686 0.010741 0.01050067 0.01020956]
mean value: 0.009861588478088379
key: score_time
value: [0.00879622 0.0093503 0.00995898 0.00946927 0.00864244 0.00944114
0.00965357 0.01037049 0.00998425 0.00960541]
mean value: 0.009527206420898438
key: test_mcc
value: [ 0.40451992 0.67419986 0.46428571 0.75592895 0.6000992 0.46428571
-0.19642857 0.37796447 0.47245559 0.32732684]
mean value: 0.43446376808762277
key: train_mcc
value: [0.67911938 0.60616144 0.68986702 0.63862773 0.65701381 0.57996733
0.66746486 0.62437433 0.66581484 0.640228 ]
mean value: 0.6448638739267128
key: test_accuracy
value: [0.6875 0.8125 0.73333333 0.86666667 0.8 0.73333333
0.4 0.66666667 0.73333333 0.66666667]
mean value: 0.71
key: train_accuracy
value: [0.83823529 0.80147059 0.83941606 0.81751825 0.82481752 0.78832117
0.83211679 0.81021898 0.83211679 0.81751825]
mean value: 0.8201749677973379
key: test_fscore
value: [0.61538462 0.76923077 0.71428571 0.83333333 0.76923077 0.71428571
0.4 0.61538462 0.77777778 0.70588235]
mean value: 0.6914795661854485
key: train_fscore
value: [0.83076923 0.79069767 0.82539683 0.80916031 0.8125 0.77862595
0.82170543 0.796875 0.82442748 0.80314961]
mean value: 0.8093307503698478
key: test_precision
value: [0.8 1. 0.71428571 1. 0.83333333 0.71428571
0.42857143 0.8 0.7 0.66666667]
mean value: 0.7657142857142857
key: train_precision
value: [0.87096774 0.83606557 0.9122807 0.85483871 0.88135593 0.82258065
0.86885246 0.85 0.85714286 0.86440678]
mean value: 0.8618491400322729
key: test_recall
value: [0.5 0.625 0.71428571 0.71428571 0.71428571 0.71428571
0.375 0.5 0.875 0.75 ]
mean value: 0.6482142857142857
key: train_recall
value: [0.79411765 0.75 0.75362319 0.76811594 0.75362319 0.73913043
0.77941176 0.75 0.79411765 0.75 ]
mean value: 0.7632139812446718
key: test_roc_auc
value: [0.6875 0.8125 0.73214286 0.85714286 0.79464286 0.73214286
0.40178571 0.67857143 0.72321429 0.66071429]
mean value: 0.7080357142857143
key: train_roc_auc
value: [0.83823529 0.80147059 0.84004689 0.8178815 0.82534101 0.78868286
0.83173487 0.80978261 0.83184143 0.81702899]
mean value: 0.8202046035805627
key: test_jcc
value: [0.44444444 0.625 0.55555556 0.71428571 0.625 0.55555556
0.25 0.44444444 0.63636364 0.54545455]
mean value: 0.5396103896103897
key: train_jcc
value: [0.71052632 0.65384615 0.7027027 0.67948718 0.68421053 0.6375
0.69736842 0.66233766 0.7012987 0.67105263]
mean value: 0.6800330294409241
MCC on Blind test: 0.26
Accuracy on Blind test: 0.69
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00962687 0.0089643 0.00940514 0.00952768 0.00951171 0.00911784
0.00948524 0.00956416 0.01086783 0.00937128]
mean value: 0.00954420566558838
key: score_time
value: [0.01055789 0.01027226 0.01053548 0.010499 0.01059937 0.00998354
0.01044297 0.01039433 0.01533222 0.01476574]
mean value: 0.011338281631469726
key: test_mcc
value: [ 0.25 0.12598816 0.32732684 0.60714286 0.64465837 0.73214286
-0.07142857 0.33928571 0.21821789 0.75592895]
mean value: 0.3929263057641339
key: train_mcc
value: [0.62196632 0.60300638 0.6647466 0.59240339 0.60592498 0.5644673
0.66581484 0.60584099 0.61074523 0.640228 ]
mean value: 0.6175144024127495
key: test_accuracy
value: [0.625 0.5625 0.66666667 0.8 0.8 0.86666667
0.46666667 0.66666667 0.6 0.86666667]
mean value: 0.6920833333333334
key: train_accuracy
value: [0.80882353 0.80147059 0.83211679 0.79562044 0.80291971 0.7810219
0.83211679 0.80291971 0.80291971 0.81751825]
mean value: 0.8077447402318592
key: test_fscore
value: [0.625 0.58823529 0.61538462 0.8 0.72727273 0.85714286
0.5 0.66666667 0.57142857 0.88888889]
mean value: 0.6840019620901974
key: train_fscore
value: [0.796875 0.8 0.83687943 0.79104478 0.80291971 0.77272727
0.82442748 0.8 0.78740157 0.80314961]
mean value: 0.8015424851518379
key: test_precision
value: [0.625 0.55555556 0.66666667 0.75 1. 0.85714286
0.5 0.71428571 0.66666667 0.8 ]
mean value: 0.7135317460317461
key: train_precision
value: [0.85 0.80597015 0.81944444 0.81538462 0.80882353 0.80952381
0.85714286 0.80597015 0.84745763 0.86440678]
mean value: 0.8284123961194615
key: test_recall
value: [0.625 0.625 0.57142857 0.85714286 0.57142857 0.85714286
0.5 0.625 0.5 1. ]
mean value: 0.6732142857142857
key: train_recall
value: [0.75 0.79411765 0.85507246 0.76811594 0.79710145 0.73913043
0.79411765 0.79411765 0.73529412 0.75 ]
mean value: 0.7777067348678601
key: test_roc_auc
value: [0.625 0.5625 0.66071429 0.80357143 0.78571429 0.86607143
0.46428571 0.66964286 0.60714286 0.85714286]
mean value: 0.6901785714285714
key: train_roc_auc
value: [0.80882353 0.80147059 0.831948 0.79582268 0.80296249 0.78132992
0.83184143 0.80285592 0.80242967 0.81702899]
mean value: 0.8076513213981245
key: test_jcc
value: [0.45454545 0.41666667 0.44444444 0.66666667 0.57142857 0.75
0.33333333 0.5 0.4 0.8 ]
mean value: 0.5337085137085137
key: train_jcc
value: [0.66233766 0.66666667 0.7195122 0.65432099 0.67073171 0.62962963
0.7012987 0.66666667 0.64935065 0.67105263]
mean value: 0.6691567497622268
MCC on Blind test: 0.26
Accuracy on Blind test: 0.64
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01161861 0.01145601 0.01143718 0.01146555 0.01147413 0.01156044
0.01610374 0.01574993 0.01282072 0.01037717]
mean value: 0.012406349182128906
key: score_time
value: [0.01001382 0.01000714 0.00987005 0.00989294 0.01008201 0.00988507
0.01483679 0.01519465 0.01058769 0.00996661]
mean value: 0.01103367805480957
key: test_mcc
value: [0.75 0.5 0.6000992 0.56407607 0.6000992 0.87287156
0.18898224 0.66143783 0.18898224 0.53452248]
mean value: 0.5461070816659918
key: train_mcc
value: [0.83905224 0.76503685 0.83951407 0.79590547 0.83951407 0.79599234
0.85434012 0.83951407 0.8251228 0.81031543]
mean value: 0.8204307458510867
key: test_accuracy
value: [0.875 0.75 0.8 0.73333333 0.8 0.93333333
0.6 0.8 0.6 0.73333333]
mean value: 0.7625
key: train_accuracy
value: [0.91911765 0.88235294 0.91970803 0.89781022 0.91970803 0.89781022
0.9270073 0.91970803 0.91240876 0.90510949]
mean value: 0.9100740661227995
key: test_fscore
value: [0.875 0.75 0.76923077 0.77777778 0.76923077 0.92307692
0.66666667 0.76923077 0.66666667 0.8 ]
mean value: 0.7766880341880341
key: train_fscore
value: [0.91729323 0.88405797 0.91970803 0.9 0.91970803 0.89705882
0.92537313 0.91970803 0.91044776 0.90510949]
mean value: 0.9098464499791336
key: test_precision
value: [0.875 0.75 0.83333333 0.63636364 0.83333333 1.
0.6 1. 0.6 0.66666667]
mean value: 0.7794696969696969
key: train_precision
value: [0.93846154 0.87142857 0.92647059 0.88732394 0.92647059 0.91044776
0.93939394 0.91304348 0.92424242 0.89855072]
mean value: 0.9135833557751614
key: test_recall
value: [0.875 0.75 0.71428571 1. 0.71428571 0.85714286
0.75 0.625 0.75 1. ]
mean value: 0.8035714285714286
key: train_recall
value: [0.89705882 0.89705882 0.91304348 0.91304348 0.91304348 0.88405797
0.91176471 0.92647059 0.89705882 0.91176471]
mean value: 0.9064364876385337
key: test_roc_auc
value: [0.875 0.75 0.79464286 0.75 0.79464286 0.92857143
0.58928571 0.8125 0.58928571 0.71428571]
mean value: 0.7598214285714285
key: train_roc_auc
value: [0.91911765 0.88235294 0.91975703 0.89769821 0.91975703 0.89791134
0.92689685 0.91975703 0.91229753 0.90515772]
mean value: 0.9100703324808184
key: test_jcc
value: [0.77777778 0.6 0.625 0.63636364 0.625 0.85714286
0.5 0.625 0.5 0.66666667]
mean value: 0.6412950937950938
key: train_jcc
value: [0.84722222 0.79220779 0.85135135 0.81818182 0.85135135 0.81333333
0.86111111 0.85135135 0.83561644 0.82666667]
mean value: 0.8348393436133162
MCC on Blind test: 0.48
Accuracy on Blind test: 0.76
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.70389104 0.67864847 1.09095931 0.99975729 0.98860121 0.84449649
0.65940809 0.66229033 0.58026719 0.63522124]
mean value: 0.7843540668487549
key: score_time
value: [0.01230264 0.01353049 0.01278973 0.01525879 0.0178535 0.0141089
0.01234412 0.01248693 0.01246476 0.01245332]
mean value: 0.013559317588806153
key: test_mcc
value: [0.37796447 0.75 0.32732684 0.46770717 0.6000992 0.32732684
0.47245559 0.66143783 0.46428571 0.53452248]
mean value: 0.4983126132351171
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.6875 0.875 0.66666667 0.66666667 0.8 0.66666667
0.73333333 0.8 0.73333333 0.73333333]
mean value: 0.73625
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.70588235 0.875 0.61538462 0.73684211 0.76923077 0.61538462
0.77777778 0.76923077 0.75 0.8 ]
mean value: 0.7414733005212881
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.875 0.66666667 0.58333333 0.83333333 0.66666667
0.7 1. 0.75 0.66666667]
mean value: 0.7408333333333333
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.875 0.57142857 1. 0.71428571 0.57142857
0.875 0.625 0.75 1. ]
mean value: 0.7732142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.6875 0.875 0.66071429 0.6875 0.79464286 0.66071429
0.72321429 0.8125 0.73214286 0.71428571]
mean value: 0.7348214285714285
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.54545455 0.77777778 0.44444444 0.58333333 0.625 0.44444444
0.63636364 0.625 0.6 0.66666667]
mean value: 0.5948484848484848
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.5
Accuracy on Blind test: 0.79
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01449776 0.01498055 0.01225662 0.01179147 0.01136565 0.01139188
0.0118866 0.01129723 0.01635385 0.01810718]
mean value: 0.013392877578735352
key: score_time
value: [0.01178002 0.01000428 0.00914884 0.00882721 0.00953579 0.00952053
0.00946212 0.00917983 0.01388979 0.0131166 ]
mean value: 0.010446500778198243
key: test_mcc
value: [0.77459667 0.8819171 0.875 1. 1. 0.87287156
0.75592895 0.87287156 0.875 1. ]
mean value: 0.8908185840836074
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.9375 0.93333333 1. 1. 0.93333333
0.86666667 0.93333333 0.93333333 1. ]
mean value: 0.94125
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.93333333 0.93333333 1. 1. 0.92307692
0.88888889 0.94117647 0.93333333 1. ]
mean value: 0.9410285139696904
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.875 1. 1. 1.
0.8 0.88888889 1. 1. ]
mean value: 0.9563888888888888
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.875 1. 1. 1. 0.85714286
1. 1. 0.875 1. ]
mean value: 0.9357142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.9375 0.9375 1. 1. 0.92857143
0.85714286 0.92857143 0.9375 1. ]
mean value: 0.9401785714285714
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.875 0.875 1. 1. 0.85714286
0.8 0.88888889 0.875 1. ]
mean value: 0.8921031746031746
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.86
Accuracy on Blind test: 0.94
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.09935236 0.10409498 0.12266827 0.1061933 0.09845328 0.10499287
0.09967995 0.09435582 0.08926463 0.09026933]
mean value: 0.10093247890472412
key: score_time
value: [0.02710509 0.01923895 0.01924515 0.01959372 0.01917553 0.02634406
0.01896572 0.01882887 0.01784062 0.01771426]
mean value: 0.02040519714355469
key: test_mcc
value: [0.5 0.62994079 0.6000992 0.49099025 0.87287156 0.73214286
0.19642857 0.66143783 0.47245559 0.53452248]
mean value: 0.5690889131896603
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75 0.8125 0.8 0.73333333 0.93333333 0.86666667
0.6 0.8 0.73333333 0.73333333]
mean value: 0.77625
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.75 0.82352941 0.76923077 0.75 0.92307692 0.85714286
0.625 0.76923077 0.77777778 0.8 ]
mean value: 0.7844988508223802
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.77777778 0.83333333 0.66666667 1. 0.85714286
0.625 1. 0.7 0.66666667]
mean value: 0.7876587301587301
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.875 0.71428571 0.85714286 0.85714286 0.85714286
0.625 0.625 0.875 1. ]
mean value: 0.8035714285714286
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.8125 0.79464286 0.74107143 0.92857143 0.86607143
0.59821429 0.8125 0.72321429 0.71428571]
mean value: 0.7741071428571429
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.6 0.7 0.625 0.6 0.85714286 0.75
0.45454545 0.625 0.63636364 0.66666667]
mean value: 0.6514718614718614
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.73
Accuracy on Blind test: 0.89
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01000309 0.0091064 0.00996566 0.00957346 0.01001763 0.01011038
0.00918674 0.009022 0.00903392 0.00947905]
mean value: 0.009549832344055176
key: score_time
value: [0.00933886 0.00877929 0.00922394 0.00939322 0.00928879 0.00938463
0.00908303 0.00888348 0.00878763 0.0089674 ]
mean value: 0.009113025665283204
key: test_mcc
value: [ 0.13483997 0.62994079 0.19642857 0.46428571 0.34247476 0.46428571
0.05455447 -0.19642857 0.19642857 0.20044593]
mean value: 0.24872559245454634
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.5625 0.8125 0.6 0.73333333 0.66666667 0.73333333
0.53333333 0.4 0.6 0.6 ]
mean value: 0.6241666666666666
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.63157895 0.82352941 0.57142857 0.71428571 0.54545455 0.71428571
0.58823529 0.4 0.625 0.7 ]
mean value: 0.6313798198705319
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.54545455 0.77777778 0.57142857 0.71428571 0.75 0.71428571
0.55555556 0.42857143 0.625 0.58333333]
mean value: 0.626569264069264
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.875 0.57142857 0.71428571 0.42857143 0.71428571
0.625 0.375 0.625 0.875 ]
mean value: 0.6553571428571429
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.5625 0.8125 0.59821429 0.73214286 0.65178571 0.73214286
0.52678571 0.40178571 0.59821429 0.58035714]
mean value: 0.6196428571428572
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.46153846 0.7 0.4 0.55555556 0.375 0.55555556
0.41666667 0.25 0.45454545 0.53846154]
mean value: 0.4707323232323232
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.41
Accuracy on Blind test: 0.72
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.18459654 1.16294837 1.18245697 1.17819452 1.17918324 1.1352222
1.12796831 1.13954997 1.12182617 1.11643767]
mean value: 1.152838397026062
key: score_time
value: [0.09779596 0.09762621 0.09741259 0.09748626 0.08932686 0.0909512
0.09467626 0.09128833 0.15364361 0.09373999]
mean value: 0.10039472579956055
key: test_mcc
value: [0.77459667 1. 0.73214286 0.87287156 0.87287156 0.875
0.64465837 0.875 0.73214286 1. ]
mean value: 0.837928387663544
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 1. 0.86666667 0.93333333 0.93333333 0.93333333
0.8 0.93333333 0.86666667 1. ]
mean value: 0.9141666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 1. 0.85714286 0.92307692 0.92307692 0.93333333
0.84210526 0.93333333 0.875 1. ]
mean value: 0.9144211490264121
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.85714286 1. 1. 0.875
0.72727273 1. 0.875 1. ]
mean value: 0.9334415584415584
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 1. 0.85714286 0.85714286 0.85714286 1.
1. 0.875 0.875 1. ]
mean value: 0.9071428571428571
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 1. 0.86607143 0.92857143 0.92857143 0.9375
0.78571429 0.9375 0.86607143 1. ]
mean value: 0.9125
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
mean value: 1.0
key: test_jcc
value: [0.75 1. 0.75 0.85714286 0.85714286 0.875
0.72727273 0.875 0.77777778 1. ]
mean value: 0.8469336219336219
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.8
Accuracy on Blind test: 0.92
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.79976606 0.85018516 0.8501091 0.89193416 0.93633461 0.90565896
0.87436795 0.92361069 0.89648938 0.87731433]
mean value: 0.8805770397186279
key: score_time
value: [0.23642707 0.20861435 0.2369709 0.24829936 0.22590995 0.23644233
0.15728378 0.22497702 0.20615268 0.19786143]
mean value: 0.2178938865661621
key: test_mcc
value: [0.8819171 0.8819171 0.73214286 0.87287156 0.87287156 1.
0.32732684 0.66143783 0.875 0.87287156]
mean value: 0.7978356410471296
key: train_mcc
value: [0.97058824 0.95598573 0.95630861 0.95713391 0.97080136 0.97080136
0.97120941 0.97080136 0.95629932 0.95629932]
mean value: 0.9636228629898831
key: test_accuracy
value: [0.9375 0.9375 0.86666667 0.93333333 0.93333333 1.
0.66666667 0.8 0.93333333 0.93333333]
mean value: 0.8941666666666667
key: train_accuracy
value: [0.98529412 0.97794118 0.97810219 0.97810219 0.98540146 0.98540146
0.98540146 0.98540146 0.97810219 0.97810219]
mean value: 0.9817249892657793
key: test_fscore
value: [0.93333333 0.93333333 0.85714286 0.92307692 0.92307692 1.
0.70588235 0.76923077 0.93333333 0.94117647]
mean value: 0.8919586296056884
key: train_fscore
value: [0.98529412 0.97777778 0.97810219 0.97777778 0.98550725 0.98550725
0.98507463 0.98529412 0.97777778 0.97777778]
mean value: 0.9815890655805546
key: test_precision
value: [1. 1. 0.85714286 1. 1. 1.
0.66666667 1. 1. 0.88888889]
mean value: 0.9412698412698413
key: train_precision
value: [0.98529412 0.98507463 0.98529412 1. 0.98550725 0.98550725
1. 0.98529412 0.98507463 0.98507463]
mean value: 0.9882120726291814
key: test_recall
value: [0.875 0.875 0.85714286 0.85714286 0.85714286 1.
0.75 0.625 0.875 1. ]
mean value: 0.8571428571428571
key: train_recall
value: [0.98529412 0.97058824 0.97101449 0.95652174 0.98550725 0.98550725
0.97058824 0.98529412 0.97058824 0.97058824]
mean value: 0.9751491901108269
key: test_roc_auc
value: [0.9375 0.9375 0.86607143 0.92857143 0.92857143 1.
0.66071429 0.8125 0.9375 0.92857143]
mean value: 0.89375
key: train_roc_auc
value: [0.98529412 0.97794118 0.97815431 0.97826087 0.98540068 0.98540068
0.98529412 0.98540068 0.97804774 0.97804774]
mean value: 0.9817242114237
key: test_jcc
value: [0.875 0.875 0.75 0.85714286 0.85714286 1.
0.54545455 0.625 0.875 0.88888889]
mean value: 0.8148629148629148
key: train_jcc
value: [0.97101449 0.95652174 0.95714286 0.95652174 0.97142857 0.97142857
0.97058824 0.97101449 0.95652174 0.95652174]
mean value: 0.9638704177323103
MCC on Blind test: 0.85
Accuracy on Blind test: 0.94
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00971198 0.0104022 0.00985646 0.00999689 0.01531601 0.0123229
0.00914288 0.00894403 0.00899482 0.01425886]
mean value: 0.010894703865051269
key: score_time
value: [0.00989151 0.00964308 0.00926447 0.01156521 0.01512861 0.01090312
0.00868583 0.00870919 0.0092032 0.01466846]
mean value: 0.010766267776489258
key: test_mcc
value: [ 0.40451992 0.67419986 0.46428571 0.75592895 0.6000992 0.46428571
-0.19642857 0.37796447 0.47245559 0.32732684]
mean value: 0.43446376808762277
key: train_mcc
value: [0.67911938 0.60616144 0.68986702 0.63862773 0.65701381 0.57996733
0.66746486 0.62437433 0.66581484 0.640228 ]
mean value: 0.6448638739267128
key: test_accuracy
value: [0.6875 0.8125 0.73333333 0.86666667 0.8 0.73333333
0.4 0.66666667 0.73333333 0.66666667]
mean value: 0.71
key: train_accuracy
value: [0.83823529 0.80147059 0.83941606 0.81751825 0.82481752 0.78832117
0.83211679 0.81021898 0.83211679 0.81751825]
mean value: 0.8201749677973379
key: test_fscore
value: [0.61538462 0.76923077 0.71428571 0.83333333 0.76923077 0.71428571
0.4 0.61538462 0.77777778 0.70588235]
mean value: 0.6914795661854485
key: train_fscore
value: [0.83076923 0.79069767 0.82539683 0.80916031 0.8125 0.77862595
0.82170543 0.796875 0.82442748 0.80314961]
mean value: 0.8093307503698478
key: test_precision
value: [0.8 1. 0.71428571 1. 0.83333333 0.71428571
0.42857143 0.8 0.7 0.66666667]
mean value: 0.7657142857142857
key: train_precision
value: [0.87096774 0.83606557 0.9122807 0.85483871 0.88135593 0.82258065
0.86885246 0.85 0.85714286 0.86440678]
mean value: 0.8618491400322729
key: test_recall
value: [0.5 0.625 0.71428571 0.71428571 0.71428571 0.71428571
0.375 0.5 0.875 0.75 ]
mean value: 0.6482142857142857
key: train_recall
value: [0.79411765 0.75 0.75362319 0.76811594 0.75362319 0.73913043
0.77941176 0.75 0.79411765 0.75 ]
mean value: 0.7632139812446718
key: test_roc_auc
value: [0.6875 0.8125 0.73214286 0.85714286 0.79464286 0.73214286
0.40178571 0.67857143 0.72321429 0.66071429]
mean value: 0.7080357142857143
key: train_roc_auc
value: [0.83823529 0.80147059 0.84004689 0.8178815 0.82534101 0.78868286
0.83173487 0.80978261 0.83184143 0.81702899]
mean value: 0.8202046035805627
key: test_jcc
value: [0.44444444 0.625 0.55555556 0.71428571 0.625 0.55555556
0.25 0.44444444 0.63636364 0.54545455]
mean value: 0.5396103896103897
key: train_jcc
value: [0.71052632 0.65384615 0.7027027 0.67948718 0.68421053 0.6375
0.69736842 0.66233766 0.7012987 0.67105263]
mean value: 0.6800330294409241
MCC on Blind test: 0.26
Accuracy on Blind test: 0.69
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.14137769 0.04036045 0.04341197 0.04366136 0.0674777 0.04775929
0.04809213 0.04925728 0.04873991 0.04506922]
mean value: 0.057520699501037595
key: score_time
value: [0.01150417 0.01063967 0.01087403 0.01018643 0.01102757 0.0106461
0.01055002 0.01029372 0.01260114 0.01114511]
mean value: 0.01094679832458496
key: test_mcc
value: [0.77459667 1. 1. 0.87287156 1. 0.875
0.87287156 1. 0.875 1. ]
mean value: 0.9270339791129423
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 1. 1. 0.93333333 1. 0.93333333
0.93333333 1. 0.93333333 1. ]
mean value: 0.9608333333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 1. 1. 0.92307692 1. 0.93333333
0.94117647 1. 0.93333333 1. ]
mean value: 0.9588062917474682
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 1. 0.875
0.88888889 1. 1. 1. ]
mean value: 0.9763888888888889
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 1. 1. 0.85714286 1. 1.
1. 1. 0.875 1. ]
mean value: 0.9482142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 1. 1. 0.92857143 1. 0.9375
0.92857143 1. 0.9375 1. ]
mean value: 0.9607142857142857
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 1. 1. 0.85714286 1. 0.875
0.88888889 1. 0.875 1. ]
mean value: 0.9246031746031746
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.88
Accuracy on Blind test: 0.96
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.0207808 0.02300382 0.0433135 0.05539727 0.05441689 0.05075097
0.05279303 0.04780746 0.06620932 0.02348566]
mean value: 0.04379587173461914
key: score_time
value: [0.01240396 0.02174282 0.02213216 0.02098989 0.0220952 0.01775432
0.01941085 0.02169156 0.01264215 0.02482295]
mean value: 0.019568586349487306
key: test_mcc
value: [-0.12598816 0.12598816 0.47245559 0.32732684 0.04029115 0.46428571
-0.07142857 0.26189246 0.6000992 0.32732684]
mean value: 0.24222492144851507
key: train_mcc
value: [1. 0.98540068 1. 0.97080136 0.97080136 0.98550418
1. 0.98550725 1. 0.98550418]
mean value: 0.9883519009251238
key: test_accuracy
value: [0.4375 0.5625 0.73333333 0.66666667 0.53333333 0.73333333
0.46666667 0.6 0.8 0.66666667]
mean value: 0.62
key: train_accuracy
value: [1. 0.99264706 1. 0.98540146 0.98540146 0.99270073
1. 0.99270073 1. 0.99270073]
mean value: 0.9941552168312581
key: test_fscore
value: [0.4 0.58823529 0.66666667 0.61538462 0.36363636 0.71428571
0.5 0.5 0.82352941 0.70588235]
mean value: 0.587762041879689
key: train_fscore
value: [1. 0.99270073 1. 0.98550725 0.98550725 0.99280576
1. 0.99270073 1. 0.99259259]
mean value: 0.9941814300595915
key: test_precision
value: [0.42857143 0.55555556 0.8 0.66666667 0.5 0.71428571
0.5 0.75 0.77777778 0.66666667]
mean value: 0.6359523809523809
key: train_precision
value: [1. 0.98550725 1. 0.98550725 0.98550725 0.98571429
1. 0.98550725 1. 1. ]
mean value: 0.9927743271221532
key: test_recall
value: [0.375 0.625 0.57142857 0.57142857 0.28571429 0.71428571
0.5 0.375 0.875 0.75 ]
mean value: 0.5642857142857143
key: train_recall
value: [1. 1. 1. 0.98550725 0.98550725 1.
1. 1. 1. 0.98529412]
mean value: 0.9956308610400683
key: test_roc_auc
value: [0.4375 0.5625 0.72321429 0.66071429 0.51785714 0.73214286
0.46428571 0.61607143 0.79464286 0.66071429]
mean value: 0.6169642857142857
key: train_roc_auc
value: [1. 0.99264706 1. 0.98540068 0.98540068 0.99264706
1. 0.99275362 1. 0.99264706]
mean value: 0.9941496163682865
key: test_jcc
value: [0.25 0.41666667 0.5 0.44444444 0.22222222 0.55555556
0.33333333 0.33333333 0.7 0.54545455]
mean value: 0.4301010101010101
key: train_jcc
value: [1. 0.98550725 1. 0.97142857 0.97142857 0.98571429
1. 0.98550725 1. 0.98529412]
mean value: 0.988488003897211
MCC on Blind test: 0.61
Accuracy on Blind test: 0.86
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02475095 0.00916529 0.00877476 0.00865769 0.01192904 0.00877118
0.00866795 0.00887179 0.01024485 0.00899529]
mean value: 0.010882878303527832
key: score_time
value: [0.00913358 0.00896764 0.00864172 0.00901842 0.00979638 0.00859523
0.00856853 0.00883698 0.00979352 0.01023078]
mean value: 0.00915827751159668
key: test_mcc
value: [0.67419986 0.75 0.73214286 0.21821789 0.87287156 0.75592895
0.19642857 0.66143783 0.32732684 0.34247476]
mean value: 0.553102911106401
key: train_mcc
value: [0.67676337 0.63406934 0.678815 0.63512361 0.73758262 0.66432225
0.79590547 0.69352089 0.73721228 0.62060153]
mean value: 0.6873916366987387
key: test_accuracy
value: [0.8125 0.875 0.86666667 0.6 0.93333333 0.86666667
0.6 0.8 0.66666667 0.66666667]
mean value: 0.76875
key: train_accuracy
value: [0.83823529 0.81617647 0.83941606 0.81751825 0.86861314 0.83211679
0.89781022 0.84671533 0.86861314 0.81021898]
mean value: 0.8435433662516101
key: test_fscore
value: [0.76923077 0.875 0.85714286 0.625 0.92307692 0.83333333
0.625 0.76923077 0.70588235 0.73684211]
mean value: 0.7719739110218986
key: train_fscore
value: [0.84057971 0.82269504 0.84057971 0.81751825 0.86764706 0.83211679
0.89552239 0.84671533 0.86764706 0.80597015]
mean value: 0.8436991475674843
key: test_precision
value: [1. 0.875 0.85714286 0.55555556 1. 1.
0.625 1. 0.66666667 0.63636364]
mean value: 0.8215728715728716
key: train_precision
value: [0.82857143 0.79452055 0.84057971 0.82352941 0.88059701 0.83823529
0.90909091 0.84057971 0.86764706 0.81818182]
mean value: 0.8441532903710471
key: test_recall
value: [0.625 0.875 0.85714286 0.71428571 0.85714286 0.71428571
0.625 0.625 0.75 0.875 ]
mean value: 0.7517857142857143
key: train_recall
value: [0.85294118 0.85294118 0.84057971 0.8115942 0.85507246 0.82608696
0.88235294 0.85294118 0.86764706 0.79411765]
mean value: 0.8436274509803922
key: test_roc_auc
value: [0.8125 0.875 0.86607143 0.60714286 0.92857143 0.85714286
0.59821429 0.8125 0.66071429 0.65178571]
mean value: 0.7669642857142858
key: train_roc_auc
value: [0.83823529 0.81617647 0.8394075 0.81756181 0.8687127 0.83216113
0.89769821 0.84676044 0.86860614 0.8101023 ]
mean value: 0.8435421994884911
key: test_jcc
value: [0.625 0.77777778 0.75 0.45454545 0.85714286 0.71428571
0.45454545 0.625 0.54545455 0.58333333]
mean value: 0.6387085137085137
key: train_jcc
value: [0.725 0.69879518 0.725 0.69135802 0.76623377 0.7125
0.81081081 0.73417722 0.76623377 0.675 ]
mean value: 0.7305108763882466
MCC on Blind test: 0.56
Accuracy on Blind test: 0.81
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0145216 0.01501608 0.01437402 0.01481843 0.01543903 0.01425052
0.01434255 0.01563168 0.014395 0.01469946]
mean value: 0.014748835563659668
key: score_time
value: [0.01158118 0.01175547 0.01154518 0.01160622 0.01153302 0.01158404
0.01175642 0.01160836 0.01161551 0.01179576]
mean value: 0.011638116836547852
key: test_mcc
value: [0.8819171 0.67419986 0.75592895 0.73214286 0.73214286 0.53452248
0.6000992 0.56407607 0.60714286 0.53452248]
mean value: 0.6616694724214908
key: train_mcc
value: [0.94280904 0.8623165 0.8130258 0.95713391 0.95629932 0.88938138
0.92944673 0.81250852 0.88920184 0.9158731 ]
mean value: 0.8967996137276029
key: test_accuracy
value: [0.9375 0.8125 0.86666667 0.86666667 0.86666667 0.73333333
0.8 0.73333333 0.8 0.73333333]
mean value: 0.815
key: train_accuracy
value: [0.97058824 0.92647059 0.89781022 0.97810219 0.97810219 0.94160584
0.96350365 0.89781022 0.94160584 0.95620438]
mean value: 0.9451803349076857
key: test_fscore
value: [0.93333333 0.76923077 0.83333333 0.85714286 0.85714286 0.6
0.82352941 0.66666667 0.8 0.8 ]
mean value: 0.7940379228614522
key: train_fscore
value: [0.96969697 0.92063492 0.88709677 0.97777778 0.97841727 0.93846154
0.96183206 0.8852459 0.9375 0.95384615]
mean value: 0.9410509363506006
key: test_precision
value: [1. 1. 1. 0.85714286 0.85714286 1.
0.77777778 1. 0.85714286 0.66666667]
mean value: 0.9015873015873016
key: train_precision
value: [1. 1. 1. 1. 0.97142857 1.
1. 1. 1. 1. ]
mean value: 0.9971428571428571
key: test_recall
value: [0.875 0.625 0.71428571 0.85714286 0.85714286 0.42857143
0.875 0.5 0.75 1. ]
mean value: 0.7482142857142857
key: train_recall
value: [0.94117647 0.85294118 0.79710145 0.95652174 0.98550725 0.88405797
0.92647059 0.79411765 0.88235294 0.91176471]
mean value: 0.8932011935208866
key: test_roc_auc
value: [0.9375 0.8125 0.85714286 0.86607143 0.86607143 0.71428571
0.79464286 0.75 0.80357143 0.71428571]
mean value: 0.8116071428571429
key: train_roc_auc
value: [0.97058824 0.92647059 0.89855072 0.97826087 0.97804774 0.94202899
0.96323529 0.89705882 0.94117647 0.95588235]
mean value: 0.9451300085251492
key: test_jcc
value: [0.875 0.625 0.71428571 0.75 0.75 0.42857143
0.7 0.5 0.66666667 0.66666667]
mean value: 0.6676190476190476
key: train_jcc
value: [0.94117647 0.85294118 0.79710145 0.95652174 0.95774648 0.88405797
0.92647059 0.79411765 0.88235294 0.91176471]
mean value: 0.8904251167705294
MCC on Blind test: 0.81
Accuracy on Blind test: 0.93
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01324773 0.01418781 0.01375175 0.01332355 0.01351047 0.01325345
0.01305151 0.01286387 0.01371384 0.01300216]
mean value: 0.013390612602233887
key: score_time
value: [0.0118475 0.01155829 0.01162076 0.01165366 0.01165652 0.01157427
0.01155686 0.01157951 0.01150799 0.01157808]
mean value: 0.0116133451461792
key: test_mcc
value: [0.37796447 0.48038446 0.75592895 0.64465837 0.49099025 0.64465837
0.6000992 0.76376262 0.73214286 0.64465837]
mean value: 0.6135247918252648
key: train_mcc
value: [0.70321085 0.6799747 0.92951942 0.78854812 0.81433714 0.82543222
0.8978896 0.89869927 0.88938138 0.88654289]
mean value: 0.8313535585121352
key: test_accuracy
value: [0.625 0.6875 0.86666667 0.8 0.73333333 0.8
0.8 0.86666667 0.86666667 0.8 ]
mean value: 0.7845833333333334
key: train_accuracy
value: [0.83088235 0.81617647 0.96350365 0.88321168 0.90510949 0.90510949
0.94890511 0.94890511 0.94160584 0.94160584]
mean value: 0.9085015027908974
key: test_fscore
value: [0.4 0.54545455 0.83333333 0.72727273 0.75 0.72727273
0.82352941 0.85714286 0.875 0.84210526]
mean value: 0.7381110865398791
key: train_fscore
value: [0.79646018 0.77477477 0.96240602 0.86885246 0.91034483 0.896
0.94814815 0.94964029 0.94444444 0.93846154]
mean value: 0.8989532672230035
key: test_precision
value: [1. 1. 1. 1. 0.66666667 1.
0.77777778 1. 0.875 0.72727273]
mean value: 0.9046717171717171
key: train_precision
value: [1. 1. 1. 1. 0.86842105 1.
0.95522388 0.92957746 0.89473684 0.98387097]
mean value: 0.9631830207864525
key: test_recall
value: [0.25 0.375 0.71428571 0.57142857 0.85714286 0.57142857
0.875 0.75 0.875 1. ]
mean value: 0.6839285714285714
key: train_recall
value: [0.66176471 0.63235294 0.92753623 0.76811594 0.95652174 0.8115942
0.94117647 0.97058824 1. 0.89705882]
mean value: 0.8566709292412618
key: test_roc_auc
value: [0.625 0.6875 0.85714286 0.78571429 0.74107143 0.78571429
0.79464286 0.875 0.86607143 0.78571429]
mean value: 0.7803571428571429
key: train_roc_auc
value: [0.83088235 0.81617647 0.96376812 0.88405797 0.90473146 0.9057971
0.9488491 0.94906223 0.94202899 0.94128303]
mean value: 0.9086636828644501
key: test_jcc
value: [0.25 0.375 0.71428571 0.57142857 0.6 0.57142857
0.7 0.75 0.77777778 0.72727273]
mean value: 0.6037193362193363
key: train_jcc
value: [0.66176471 0.63235294 0.92753623 0.76811594 0.83544304 0.8115942
0.90140845 0.90410959 0.89473684 0.88405797]
mean value: 0.8221119914710179
MCC on Blind test: 0.57
Accuracy on Blind test: 0.83
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.11366677 0.0940814 0.10225201 0.10271406 0.09806108 0.09513283
0.09545851 0.09452772 0.09964037 0.09940076]
mean value: 0.09949355125427246
key: score_time
value: [0.01469493 0.01464295 0.02262974 0.01577759 0.01471615 0.0148375
0.0147779 0.01488066 0.01598573 0.01483297]
mean value: 0.01577761173248291
key: test_mcc
value: [0.77459667 1. 0.76376262 0.87287156 1. 1.
0.87287156 1. 0.875 1. ]
mean value: 0.9159102406955395
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 1. 0.86666667 0.93333333 1. 1.
0.93333333 1. 0.93333333 1. ]
mean value: 0.9541666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 1. 0.875 0.92307692 1. 1.
0.94117647 1. 0.93333333 1. ]
mean value: 0.9529729584141349
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.77777778 1. 1. 1.
0.88888889 1. 1. 1. ]
mean value: 0.9666666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 1. 1. 0.85714286 1. 1.
1. 1. 0.875 1. ]
mean value: 0.9482142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 1. 0.875 0.92857143 1. 1.
0.92857143 1. 0.9375 1. ]
mean value: 0.9544642857142858
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 1. 0.77777778 0.85714286 1. 1.
0.88888889 1. 0.875 1. ]
mean value: 0.9148809523809524
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.88
Accuracy on Blind test: 0.96
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.0341208 0.03071618 0.06870103 0.02894068 0.03026152 0.03227735
0.03287911 0.0358839 0.04638481 0.04289818]
mean value: 0.038306355476379395
key: score_time
value: [0.02494836 0.02263951 0.02276754 0.01839662 0.01910567 0.02346301
0.02387643 0.02339864 0.03518057 0.03576827]
mean value: 0.024954462051391603
key: test_mcc
value: [0.77459667 1. 1. 0.87287156 1. 1.
0.87287156 1. 0.875 1. ]
mean value: 0.9395339791129422
key: train_mcc
value: [0.98540068 0.98540068 0.98550725 0.98550725 0.98550725 0.98550725
0.97120941 0.98550418 1. 0.98550418]
mean value: 0.9855048108412058
key: test_accuracy
value: [0.875 1. 1. 0.93333333 1. 1.
0.93333333 1. 0.93333333 1. ]
mean value: 0.9675
key: train_accuracy
value: [0.99264706 0.99264706 0.99270073 0.99270073 0.99270073 0.99270073
0.98540146 0.99270073 1. 0.99270073]
mean value: 0.9926899957063118
key: test_fscore
value: [0.85714286 1. 1. 0.92307692 1. 1.
0.94117647 1. 0.93333333 1. ]
mean value: 0.9654729584141348
key: train_fscore
value: [0.99259259 0.99259259 0.99270073 0.99270073 0.99270073 0.99270073
0.98507463 0.99259259 1. 0.99259259]
mean value: 0.9926247916944072
key: test_precision
value: [1. 1. 1. 1. 1. 1.
0.88888889 1. 1. 1. ]
mean value: 0.9888888888888889
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 1. 1. 0.85714286 1. 1.
1. 1. 0.875 1. ]
mean value: 0.9482142857142857
key: train_recall
value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725
0.97058824 0.98529412 1. 0.98529412]
mean value: 0.98537936913896
key: test_roc_auc
value: [0.875 1. 1. 0.92857143 1. 1.
0.92857143 1. 0.9375 1. ]
mean value: 0.9669642857142857
key: train_roc_auc
value: [0.99264706 0.99264706 0.99275362 0.99275362 0.99275362 0.99275362
0.98529412 0.99264706 1. 0.99264706]
mean value: 0.9926896845694799
key: test_jcc
value: [0.75 1. 1. 0.85714286 1. 1.
0.88888889 1. 0.875 1. ]
mean value: 0.9371031746031746
key: train_jcc
value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725
0.97058824 0.98529412 1. 0.98529412]
mean value: 0.98537936913896
MCC on Blind test: 0.88
Accuracy on Blind test: 0.96
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.03243661 0.04290581 0.07457685 0.05478549 0.05016804 0.04229784
0.05081439 0.05759549 0.05455852 0.05034065]
mean value: 0.05104796886444092
key: score_time
value: [0.01870847 0.02562094 0.02191496 0.0251255 0.03725863 0.0223453
0.02001977 0.02464914 0.01784778 0.02232647]
mean value: 0.023581695556640626
key: test_mcc
value: [0.51639778 0.51639778 0.47245559 0.21821789 0.64465837 0.6000992
0.07142857 0.46770717 0.32732684 0.60714286]
mean value: 0.4441832047127614
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75 0.75 0.73333333 0.6 0.8 0.8
0.53333333 0.66666667 0.66666667 0.8 ]
mean value: 0.71
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.71428571 0.71428571 0.66666667 0.625 0.72727273 0.76923077
0.53333333 0.54545455 0.70588235 0.8 ]
mean value: 0.6801411823470647
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.83333333 0.83333333 0.8 0.55555556 1. 0.83333333
0.57142857 1. 0.66666667 0.85714286]
mean value: 0.795079365079365
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.625 0.625 0.57142857 0.71428571 0.57142857 0.71428571
0.5 0.375 0.75 0.75 ]
mean value: 0.6196428571428572
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.75 0.72321429 0.60714286 0.78571429 0.79464286
0.53571429 0.6875 0.66071429 0.80357143]
mean value: 0.7098214285714286
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.55555556 0.55555556 0.5 0.45454545 0.57142857 0.625
0.36363636 0.375 0.54545455 0.66666667]
mean value: 0.5212842712842712
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.3
Accuracy on Blind test: 0.68
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.26349664 0.30767131 0.26317835 0.27116275 0.31355691 0.28043199
0.25030661 0.25096774 0.2339437 0.26458788]
mean value: 0.26993038654327395
key: score_time
value: [0.01109076 0.01076746 0.00925684 0.01056147 0.01435041 0.01452732
0.009161 0.00908542 0.00923038 0.00994611]
mean value: 0.010797715187072754
key: test_mcc
value: [0.77459667 1. 1. 0.87287156 1. 0.87287156
0.75592895 1. 0.875 1. ]
mean value: 0.9151268737147877
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 1. 1. 0.93333333 1. 0.93333333
0.86666667 1. 0.93333333 1. ]
mean value: 0.9541666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 1. 1. 0.92307692 1. 0.92307692
0.88888889 1. 0.93333333 1. ]
mean value: 0.9525518925518925
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 1. 1. 0.8 1. 1. 1. ]
mean value: 0.98
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 1. 1. 0.85714286 1. 0.85714286
1. 1. 0.875 1. ]
mean value: 0.9339285714285714
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 1. 1. 0.92857143 1. 0.92857143
0.85714286 1. 0.9375 1. ]
mean value: 0.9526785714285715
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 1. 1. 0.85714286 1. 0.85714286
0.8 1. 0.875 1. ]
mean value: 0.9139285714285714
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.88
Accuracy on Blind test: 0.96
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01642203 0.01763225 0.02696109 0.01796865 0.01799726 0.01788306
0.01834369 0.02878118 0.01888704 0.02675176]
mean value: 0.02076280117034912
key: score_time
value: [0.01238465 0.01226735 0.01235175 0.01354599 0.01394749 0.01234221
0.01371408 0.01333451 0.01352239 0.01251984]
mean value: 0.01299302577972412
key: test_mcc
value: [ 0.12598816 0. -0.05455447 -0.13363062 -0.64465837 0.04029115
0.49099025 -0.19642857 0.19642857 0.33928571]
mean value: 0.016371180845219407
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.5625 0.5 0.46666667 0.46666667 0.2 0.53333333
0.73333333 0.4 0.6 0.66666667]
mean value: 0.5129166666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.58823529 0.5 0.5 0.2 0.33333333 0.36363636
0.71428571 0.4 0.625 0.66666667]
mean value: 0.4891157372039725
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.55555556 0.5 0.44444444 0.33333333 0.27272727 0.5
0.83333333 0.42857143 0.625 0.71428571]
mean value: 0.5207251082251082
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.625 0.5 0.57142857 0.14285714 0.42857143 0.28571429
0.625 0.375 0.625 0.625 ]
mean value: 0.48035714285714287
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.5625 0.5 0.47321429 0.44642857 0.21428571 0.51785714
0.74107143 0.40178571 0.59821429 0.66964286]
mean value: 0.5125
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.41666667 0.33333333 0.33333333 0.11111111 0.2 0.22222222
0.55555556 0.25 0.45454545 0.5 ]
mean value: 0.3376767676767677
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.49
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02977848 0.03747535 0.04021692 0.03384948 0.03854942 0.03706765
0.03542995 0.03389931 0.03391671 0.03379488]
mean value: 0.035397815704345706
key: score_time
value: [0.02066422 0.02376986 0.02077508 0.02034974 0.02166724 0.02019167
0.02067709 0.02275515 0.02010059 0.02273393]
mean value: 0.02136845588684082
key: test_mcc
value: [0.62994079 1. 0.87287156 0.76376262 0.87287156 0.87287156
0.64465837 0.66143783 0.73214286 0.53452248]
mean value: 0.7585079626960751
key: train_mcc
value: [0.97058824 0.97058824 0.97080136 0.97080136 0.97080136 0.97080136
0.98550418 0.97080136 0.97080136 0.97080136]
mean value: 0.9722290198043756
key: test_accuracy
value: [0.8125 1. 0.93333333 0.86666667 0.93333333 0.93333333
0.8 0.8 0.86666667 0.73333333]
mean value: 0.8679166666666667
key: train_accuracy
value: [0.98529412 0.98529412 0.98540146 0.98540146 0.98540146 0.98540146
0.99270073 0.98540146 0.98540146 0.98540146]
mean value: 0.9861099184199227
key: test_fscore
value: [0.8 1. 0.92307692 0.875 0.92307692 0.92307692
0.84210526 0.76923077 0.875 0.8 ]
mean value: 0.8730566801619433
key: train_fscore
value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725
0.99259259 0.98529412 0.98529412 0.98529412]
mean value: 0.9861092166335134
key: test_precision
value: [0.85714286 1. 1. 0.77777778 1. 1.
0.72727273 1. 0.875 0.66666667]
mean value: 0.8903860028860029
key: train_precision
value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725
1. 0.98529412 0.98529412 0.98529412]
mean value: 0.9868499573742541
key: test_recall
value: [0.75 1. 0.85714286 1. 0.85714286 0.85714286
1. 0.625 0.875 1. ]
mean value: 0.8821428571428571
key: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:168: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
train_recall
value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725
0.98529412 0.98529412 0.98529412 0.98529412]
mean value: 0.98537936913896
key: test_roc_auc
value: [0.8125 1. 0.92857143 0.875 0.92857143 0.92857143
0.78571429 0.8125 0.86607143 0.71428571]
mean value: 0.8651785714285715
key: train_roc_auc
value: [0.98529412 0.98529412 0.98540068 0.98540068 0.98540068 0.98540068
0.99264706 0.98540068 0.98540068 0.98540068]
mean value: 0.9861040068201194
key: test_jcc
value: [0.66666667 1. 0.85714286 0.77777778 0.85714286 0.85714286
0.72727273 0.625 0.77777778 0.66666667]
mean value: 0.7812590187590187
key: train_jcc
value: [0.97101449 0.97101449 0.97142857 0.97142857 0.97142857 0.97142857
0.98529412 0.97101449 0.97101449 0.97101449]
mean value: 0.9726080867129461
MCC on Blind test: 0.76
Accuracy on Blind test: 0.91
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.20923996 0.30740142 0.23965693 0.30709243 0.33208108 0.24986792
0.23055601 0.22767615 0.24073982 0.21366882]
mean value: 0.2557980537414551
key: score_time
value: [0.02369094 0.02021146 0.02024269 0.02145529 0.02396369 0.02035141
0.02180147 0.02421188 0.0203321 0.02221036]
mean value: 0.021847128868103027
key: test_mcc
value: [0.62994079 1. 0.87287156 0.76376262 0.87287156 0.87287156
0.64465837 0.66143783 0.73214286 0.53452248]
mean value: 0.7585079626960751
key: train_mcc
value: [0.97058824 0.97058824 0.97080136 0.97080136 0.97080136 0.97080136
0.98550418 0.97080136 0.97080136 0.97080136]
mean value: 0.9722290198043756
key: test_accuracy
value: [0.8125 1. 0.93333333 0.86666667 0.93333333 0.93333333
0.8 0.8 0.86666667 0.73333333]
mean value: 0.8679166666666667
key: train_accuracy
value: [0.98529412 0.98529412 0.98540146 0.98540146 0.98540146 0.98540146
0.99270073 0.98540146 0.98540146 0.98540146]
mean value: 0.9861099184199227
key: test_fscore
value: [0.8 1. 0.92307692 0.875 0.92307692 0.92307692
0.84210526 0.76923077 0.875 0.8 ]
mean value: 0.8730566801619433
key: train_fscore
value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725
0.99259259 0.98529412 0.98529412 0.98529412]
mean value: 0.9861092166335134
key: test_precision
value: [0.85714286 1. 1. 0.77777778 1. 1.
0.72727273 1. 0.875 0.66666667]
mean value: 0.8903860028860029
key: train_precision
value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725
1. 0.98529412 0.98529412 0.98529412]
mean value: 0.9868499573742541
key: test_recall
value: [0.75 1. 0.85714286 1. 0.85714286 0.85714286
1. 0.625 0.875 1. ]
mean value: 0.8821428571428571
key: train_recall
value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725
0.98529412 0.98529412 0.98529412 0.98529412]
mean value: 0.98537936913896
key: test_roc_auc
value: [0.8125 1. 0.92857143 0.875 0.92857143 0.92857143
0.78571429 0.8125 0.86607143 0.71428571]
mean value: 0.8651785714285715
key: train_roc_auc
value: [0.98529412 0.98529412 0.98540068 0.98540068 0.98540068 0.98540068
0.99264706 0.98540068 0.98540068 0.98540068]
mean value: 0.9861040068201194
key: test_jcc
value: [0.66666667 1. 0.85714286 0.77777778 0.85714286 0.85714286
0.72727273 0.625 0.77777778 0.66666667]
mean value: 0.7812590187590187
key: train_jcc
value: [0.97101449 0.97101449 0.97142857 0.97142857 0.97142857 0.97142857
0.98529412 0.97101449 0.97101449 0.97101449]
mean value: 0.9726080867129461
MCC on Blind test: 0.76
Accuracy on Blind test: 0.91
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04199767 0.03794289 0.03837681 0.05303359 0.15559244 0.13856936
0.05898094 0.04978466 0.07463431 0.04460335]
mean value: 0.06935160160064698
key: score_time
value: [0.01235557 0.01329517 0.01337838 0.01518774 0.02457047 0.03380871
0.0172708 0.01403213 0.02101779 0.01378417]
mean value: 0.017870092391967775
key: test_mcc
value: [0.9321832 0.93202124 0.8951918 0.82490815 0.71611487 0.78772636
0.89342711 0.89342711 0.78772636 0.80439967]
mean value: 0.8467125880292498
key: train_mcc
value: [0.89746503 0.90927764 0.89754406 0.90138653 0.90163769 0.88213591
0.93313595 0.90551181 0.92125984 0.92520402]
mean value: 0.9074558497858128
key: test_accuracy
value: [0.96491228 0.96491228 0.94736842 0.9122807 0.85714286 0.89285714
0.94642857 0.94642857 0.89285714 0.89285714]
mean value: 0.9218045112781955
key: train_accuracy
value: [0.94871795 0.95463511 0.94871795 0.95069034 0.9507874 0.94094488
0.96653543 0.95275591 0.96062992 0.96259843]
mean value: 0.9537013309726816
key: test_fscore
value: [0.96551724 0.96296296 0.94915254 0.91525424 0.86206897 0.89655172
0.94736842 0.94736842 0.88888889 0.90322581]
mean value: 0.9238359211104228
key: train_fscore
value: [0.9486166 0.95463511 0.94820717 0.95049505 0.95107632 0.94163424
0.9667319 0.95275591 0.96062992 0.96252465]
mean value: 0.9537306872118687
key: test_precision
value: [0.93333333 1. 0.93333333 0.9 0.83333333 0.86666667
0.93103448 0.93103448 0.92307692 0.82352941]
mean value: 0.9075341967025538
key: train_precision
value: [0.95238095 0.95652174 0.95582329 0.95238095 0.94552529 0.93076923
0.96108949 0.95275591 0.96062992 0.96442688]
mean value: 0.9532303658068488
key: test_recall
value: [1. 0.92857143 0.96551724 0.93103448 0.89285714 0.92857143
0.96428571 0.96428571 0.85714286 1. ]
mean value: 0.9432266009852217
key: train_recall
value: [0.94488189 0.95275591 0.94071146 0.9486166 0.95669291 0.95275591
0.97244094 0.95275591 0.96062992 0.96062992]
mean value: 0.9542871370327721
key: test_roc_auc
value: [0.96551724 0.96428571 0.94704433 0.91194581 0.85714286 0.89285714
0.94642857 0.94642857 0.89285714 0.89285714]
mean value: 0.9217364532019705
key: train_roc_auc
value: [0.94872553 0.95463882 0.94870219 0.95068625 0.9507874 0.94094488
0.96653543 0.95275591 0.96062992 0.96259843]
mean value: 0.9537004761756559
key: test_jcc
value: [0.93333333 0.92857143 0.90322581 0.84375 0.75757576 0.8125
0.9 0.9 0.8 0.82352941]
mean value: 0.8602485737696839
key: train_jcc
value: [0.90225564 0.91320755 0.90151515 0.90566038 0.90671642 0.88970588
0.93560606 0.90977444 0.92424242 0.92775665]
mean value: 0.9116440590335693
MCC on Blind test: 0.69
Accuracy on Blind test: 0.88
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.9814384 1.1654892 1.1827333 1.37071514 1.02586007 1.44153547
1.04063892 1.14810324 0.88139033 0.93454194]
mean value: 1.1172446012496948
key: score_time
value: [0.01375628 0.02266359 0.02546477 0.01354003 0.01353526 0.01361775
0.01248193 0.0207324 0.01353216 0.01379561]
mean value: 0.01631197929382324
key: test_mcc
value: [0.86851042 0.8951918 0.93202124 0.89952865 0.82195294 0.85714286
0.93094934 0.96490128 0.85933785 0.93094934]
mean value: 0.8960485710881759
key: train_mcc
value: [0.99211042 0.98028384 0.98817342 0.98425172 0.98032256 0.99212598
0.99212598 1. 0.99212598 0.98819663]
mean value: 0.9889716545512739
key: test_accuracy
value: [0.92982456 0.94736842 0.96491228 0.94736842 0.91071429 0.92857143
0.96428571 0.98214286 0.92857143 0.96428571]
mean value: 0.9468045112781955
key: train_accuracy
value: [0.99605523 0.99013807 0.99408284 0.99211045 0.99015748 0.99606299
0.99606299 1. 0.99606299 0.99409449]
mean value: 0.9944827532653093
key: test_fscore
value: [0.93333333 0.94545455 0.96666667 0.95081967 0.9122807 0.92857143
0.96551724 0.98245614 0.93103448 0.96551724]
mean value: 0.9481651453779626
key: train_fscore
value: [0.99606299 0.99013807 0.99408284 0.99212598 0.99013807 0.99606299
0.99606299 1. 0.99606299 0.99410609]
mean value: 0.9944843017488161
key: test_precision
value: [0.875 0.96296296 0.93548387 0.90625 0.89655172 0.92857143
0.93333333 0.96551724 0.9 0.93333333]
mean value: 0.9237003894686041
key: train_precision
value: [0.99606299 0.99209486 0.99212598 0.98823529 0.99209486 0.99606299
0.99606299 1. 0.99606299 0.99215686]
mean value: 0.9940959832938809
key: test_recall
value: [1. 0.92857143 1. 1. 0.92857143 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.975
key: train_recall
value: [0.99606299 0.98818898 0.99604743 0.99604743 0.98818898 0.99606299
0.99606299 1. 0.99606299 0.99606299]
mean value: 0.9948787775045906
key: test_roc_auc
value: [0.93103448 0.94704433 0.96428571 0.94642857 0.91071429 0.92857143
0.96428571 0.98214286 0.92857143 0.96428571]
mean value: 0.9467364532019705
key: train_roc_auc
value: [0.99605521 0.99014192 0.99408671 0.9921182 0.99015748 0.99606299
0.99606299 1. 0.99606299 0.99409449]
mean value: 0.9944842986523917
key: test_jcc
value: [0.875 0.89655172 0.93548387 0.90625 0.83870968 0.86666667
0.93333333 0.96551724 0.87096774 0.93333333]
mean value: 0.9021813589173155
key: train_jcc
value: [0.99215686 0.98046875 0.98823529 0.984375 0.98046875 0.99215686
0.99215686 1. 0.99215686 0.98828125]
mean value: 0.9890456495098039
MCC on Blind test: 0.81
Accuracy on Blind test: 0.93
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.0210526 0.01111865 0.01016021 0.01017928 0.00997281 0.01037288
0.01188016 0.01034904 0.0100739 0.01008177]
mean value: 0.011524128913879394
key: score_time
value: [0.01222992 0.00975513 0.00912452 0.00903606 0.00887966 0.00971317
0.01039076 0.00900006 0.00902438 0.00893545]
mean value: 0.009608912467956542
key: test_mcc
value: [0.75492611 0.69397486 0.75462449 0.58076493 0.50128041 0.75047877
0.75047877 0.57142857 0.68250015 0.72168784]
mean value: 0.6762144912656113
key: train_mcc
value: [0.75216564 0.74385846 0.73986336 0.73178133 0.76786532 0.72461164
0.69640469 0.7442387 0.79728008 0.70868339]
mean value: 0.740675262273998
key: test_accuracy
value: [0.87719298 0.84210526 0.87719298 0.78947368 0.75 0.875
0.875 0.78571429 0.83928571 0.85714286]
mean value: 0.8368107769423558
key: train_accuracy
value: [0.87573964 0.87179487 0.86982249 0.86587771 0.88385827 0.86220472
0.84448819 0.87204724 0.8976378 0.85433071]
mean value: 0.8697801643137804
key: test_fscore
value: [0.87719298 0.82352941 0.88135593 0.78571429 0.75862069 0.87719298
0.87272727 0.78571429 0.83018868 0.86666667]
mean value: 0.8358903188603343
key: train_fscore
value: [0.87861272 0.87378641 0.87109375 0.86614173 0.88499025 0.86381323
0.83227176 0.87329435 0.90114068 0.85490196]
mean value: 0.8700046844178336
key: test_precision
value: [0.86206897 0.91304348 0.86666667 0.81481481 0.73333333 0.86206897
0.88888889 0.78571429 0.88 0.8125 ]
mean value: 0.8419099398713341
key: train_precision
value: [0.86037736 0.86206897 0.86100386 0.8627451 0.87644788 0.85384615
0.90322581 0.86486486 0.87132353 0.8515625 ]
mean value: 0.8667466014073157
key: test_recall
value: [0.89285714 0.75 0.89655172 0.75862069 0.78571429 0.89285714
0.85714286 0.78571429 0.78571429 0.92857143]
mean value: 0.8333743842364532
key: train_recall
value: [0.8976378 0.88582677 0.88142292 0.86956522 0.89370079 0.87401575
0.77165354 0.88188976 0.93307087 0.85826772]
mean value: 0.8747051134418474
key: test_roc_auc
value: [0.87746305 0.84051724 0.87684729 0.79002463 0.75 0.875
0.875 0.78571429 0.83928571 0.85714286]
mean value: 0.8366995073891625
key: train_roc_auc
value: [0.87569637 0.87176714 0.86984532 0.86588497 0.88385827 0.86220472
0.84448819 0.87204724 0.8976378 0.85433071]
mean value: 0.8697760729513554
key: test_jcc
value: [0.78125 0.7 0.78787879 0.64705882 0.61111111 0.78125
0.77419355 0.64705882 0.70967742 0.76470588]
mean value: 0.7204184396143599
key: train_jcc
value: [0.78350515 0.77586207 0.7716263 0.76388889 0.79370629 0.76027397
0.71272727 0.77508651 0.8200692 0.74657534]
mean value: 0.7703321000916057
MCC on Blind test: 0.43
Accuracy on Blind test: 0.78
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01048732 0.01034641 0.01061344 0.01354074 0.01033831 0.01015806
0.01025128 0.01018333 0.01019001 0.01029396]
mean value: 0.010640287399291992
key: score_time
value: [0.00899577 0.00907207 0.00908756 0.00907373 0.00891471 0.00884938
0.00887942 0.00890255 0.00902867 0.00936127]
mean value: 0.00901651382446289
key: test_mcc
value: [0.72242731 0.61405719 0.58076493 0.47413793 0.39310793 0.60753044
0.64951905 0.75047877 0.58501794 0.57142857]
mean value: 0.5948470056906343
key: train_mcc
value: [0.61736329 0.62938349 0.62938349 0.63709364 0.64961133 0.59933628
0.63787438 0.59872224 0.62622211 0.63009708]
mean value: 0.6255087330440435
key: test_accuracy
value: [0.84210526 0.80701754 0.78947368 0.73684211 0.69642857 0.80357143
0.82142857 0.875 0.78571429 0.78571429]
mean value: 0.794329573934837
key: train_accuracy
value: [0.8086785 0.81459566 0.81459566 0.81854043 0.82480315 0.7992126
0.81889764 0.7992126 0.81299213 0.81496063]
mean value: 0.8126488996567737
key: test_fscore
value: [0.86153846 0.8 0.78571429 0.73684211 0.70175439 0.80701754
0.83333333 0.87272727 0.76 0.78571429]
mean value: 0.7944641674115358
key: train_fscore
value: [0.8086785 0.812749 0.81640625 0.81746032 0.82445759 0.79352227
0.8203125 0.796 0.81553398 0.812749 ]
mean value: 0.8117869417892003
key: test_precision
value: [0.75675676 0.81481481 0.81481481 0.75 0.68965517 0.79310345
0.78125 0.88888889 0.86363636 0.78571429]
mean value: 0.7938634545315579
key: train_precision
value: [0.81027668 0.82258065 0.80694981 0.82071713 0.82608696 0.81666667
0.81395349 0.80894309 0.8045977 0.82258065]
mean value: 0.8153352810729207
key: test_recall
value: [1. 0.78571429 0.75862069 0.72413793 0.71428571 0.82142857
0.89285714 0.85714286 0.67857143 0.78571429]
mean value: 0.8018472906403941
key: train_recall
value: [0.80708661 0.80314961 0.82608696 0.81422925 0.82283465 0.77165354
0.82677165 0.78346457 0.82677165 0.80314961]
mean value: 0.8085198095297377
key: test_roc_auc
value: [0.84482759 0.80665025 0.79002463 0.73706897 0.69642857 0.80357143
0.82142857 0.875 0.78571429 0.78571429]
mean value: 0.7946428571428572
key: train_roc_auc
value: [0.80868165 0.81461828 0.81461828 0.81853195 0.82480315 0.7992126
0.81889764 0.7992126 0.81299213 0.81496063]
mean value: 0.8126528897326569
key: test_jcc
value: [0.75675676 0.66666667 0.64705882 0.58333333 0.54054054 0.67647059
0.71428571 0.77419355 0.61290323 0.64705882]
mean value: 0.6619268021070678
key: train_jcc
value: [0.67880795 0.68456376 0.68976898 0.69127517 0.70134228 0.65771812
0.69536424 0.66112957 0.68852459 0.68456376]
mean value: 0.6833058407846723
MCC on Blind test: 0.2
Accuracy on Blind test: 0.69
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01001644 0.01116848 0.01114178 0.01109457 0.01129532 0.01162243
0.0112431 0.01212406 0.01101089 0.01048851]
mean value: 0.01112055778503418
key: score_time
value: [0.01606083 0.01568818 0.01301765 0.01356721 0.01365757 0.01359797
0.01383781 0.01748276 0.01388144 0.01317573]
mean value: 0.01439671516418457
key: test_mcc
value: [0.7366424 0.6166424 0.68434084 0.6317806 0.5118907 0.58501794
0.53605627 0.68250015 0.58501794 0.46697379]
mean value: 0.6036863009651918
key: train_mcc
value: [0.76398832 0.74554603 0.75880927 0.76806178 0.79775247 0.78489793
0.76354997 0.76417218 0.73925749 0.79155948]
mean value: 0.767759492902757
key: test_accuracy
value: [0.85964912 0.80701754 0.84210526 0.80701754 0.75 0.78571429
0.76785714 0.83928571 0.78571429 0.73214286]
mean value: 0.7976503759398497
key: train_accuracy
value: [0.87771203 0.86982249 0.87573964 0.8816568 0.8976378 0.88779528
0.87992126 0.87992126 0.86811024 0.89370079]
mean value: 0.8812017580642656
key: test_fscore
value: [0.87096774 0.79245283 0.84745763 0.83076923 0.77419355 0.80645161
0.77192982 0.84745763 0.76 0.74576271]
mean value: 0.8047442754846815
key: train_fscore
value: [0.88644689 0.87777778 0.88354898 0.88764045 0.90151515 0.89579525
0.88555347 0.88598131 0.87382298 0.8988764 ]
mean value: 0.8876958654685702
key: test_precision
value: [0.79411765 0.84 0.83333333 0.75 0.70588235 0.73529412
0.75862069 0.80645161 0.86363636 0.70967742]
mean value: 0.7797013536529993
key: train_precision
value: [0.82876712 0.82867133 0.82986111 0.84341637 0.86861314 0.83617747
0.84587814 0.84341637 0.83754513 0.85714286]
mean value: 0.841948903606986
key: test_recall
value: [0.96428571 0.75 0.86206897 0.93103448 0.85714286 0.89285714
0.78571429 0.89285714 0.67857143 0.78571429]
mean value: 0.8400246305418719
key: train_recall
value: [0.95275591 0.93307087 0.94466403 0.93675889 0.93700787 0.96456693
0.92913386 0.93307087 0.91338583 0.94488189]
mean value: 0.9389296940649218
key: test_roc_auc
value: [0.8614532 0.80603448 0.84174877 0.80480296 0.75 0.78571429
0.76785714 0.83928571 0.78571429 0.73214286]
mean value: 0.797475369458128
key: train_roc_auc
value: [0.87756372 0.86969749 0.87587532 0.88176527 0.8976378 0.88779528
0.87992126 0.87992126 0.86811024 0.89370079]
mean value: 0.8811988422395818
key: test_jcc
value: [0.77142857 0.65625 0.73529412 0.71052632 0.63157895 0.67567568
0.62857143 0.73529412 0.61290323 0.59459459]
mean value: 0.6752116994528734
key: train_jcc
value: [0.79605263 0.78217822 0.79139073 0.7979798 0.82068966 0.81125828
0.79461279 0.79530201 0.77591973 0.81632653]
mean value: 0.7981710380264788
MCC on Blind test: 0.24
Accuracy on Blind test: 0.7
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.03288817 0.02851772 0.02310157 0.02252603 0.02221966 0.02230144
0.02194214 0.02222419 0.0223546 0.02229118]
mean value: 0.024036669731140138
key: score_time
value: [0.01327085 0.01344895 0.0130353 0.01236677 0.01236677 0.01230669
0.01217031 0.01236486 0.01230979 0.01227903]
mean value: 0.012591934204101563
key: test_mcc
value: [0.80817326 0.8951918 0.82880708 0.79682005 0.61065803 0.72168784
0.79385662 0.82195294 0.64450339 0.77459667]
mean value: 0.7696247676639518
key: train_mcc
value: [0.82431719 0.86987986 0.8390677 0.85106594 0.84756752 0.84464326
0.86681377 0.83968318 0.87412415 0.84725158]
mean value: 0.8504414138602343
key: test_accuracy
value: [0.89473684 0.94736842 0.9122807 0.89473684 0.80357143 0.85714286
0.89285714 0.91071429 0.82142857 0.875 ]
mean value: 0.8809837092731829
key: train_accuracy
value: [0.9112426 0.93491124 0.91913215 0.92504931 0.92322835 0.92125984
0.93307087 0.91929134 0.93700787 0.92322835]
mean value: 0.924742191989315
key: test_fscore
value: [0.90322581 0.94545455 0.91803279 0.90322581 0.81355932 0.86666667
0.9 0.9122807 0.81481481 0.88888889]
mean value: 0.8866149339401671
key: train_fscore
value: [0.91428571 0.93542074 0.92069632 0.92664093 0.92514395 0.92395437
0.93436293 0.92130518 0.9375 0.92485549]
mean value: 0.9264165644110587
key: test_precision
value: [0.82352941 0.96296296 0.875 0.84848485 0.77419355 0.8125
0.84375 0.89655172 0.84615385 0.8 ]
mean value: 0.8483126341891392
key: train_precision
value: [0.88560886 0.92996109 0.90151515 0.90566038 0.90262172 0.89338235
0.91666667 0.8988764 0.93023256 0.90566038]
mean value: 0.9070185556903059
key: test_recall
value: [1. 0.92857143 0.96551724 0.96551724 0.85714286 0.92857143
0.96428571 0.92857143 0.78571429 1. ]
mean value: 0.9323891625615763
key: train_recall
value: [0.94488189 0.94094488 0.94071146 0.9486166 0.9488189 0.95669291
0.95275591 0.94488189 0.94488189 0.94488189]
mean value: 0.9468068220721422
key: test_roc_auc
value: [0.89655172 0.94704433 0.91133005 0.89347291 0.80357143 0.85714286
0.89285714 0.91071429 0.82142857 0.875 ]
mean value: 0.8809113300492611
key: train_roc_auc
value: [0.91117612 0.93489932 0.91917463 0.9250957 0.92322835 0.92125984
0.93307087 0.91929134 0.93700787 0.92322835]
mean value: 0.924743238616912
key: test_jcc
value: [0.82352941 0.89655172 0.84848485 0.82352941 0.68571429 0.76470588
0.81818182 0.83870968 0.6875 0.8 ]
mean value: 0.7986907059820592
key: train_jcc
value: [0.84210526 0.87867647 0.85304659 0.86330935 0.86071429 0.85865724
0.87681159 0.85409253 0.88235294 0.86021505]
mean value: 0.8629981326609936
MCC on Blind test: 0.66
Accuracy on Blind test: 0.88
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.15022874 2.26373672 2.10535789 1.289114 2.09971786 2.04547882
2.0241437 2.11169195 2.62805915 2.49180293]
mean value: 2.120933175086975
key: score_time
value: [0.01304388 0.01381183 0.02200556 0.01263905 0.01398277 0.03204012
0.01388478 0.01397943 0.02038717 0.0143764 ]
mean value: 0.01701509952545166
key: test_mcc
value: [0.96551724 0.89988258 0.96547546 0.83703659 0.79385662 0.89342711
0.96490128 0.96490128 0.93094934 0.8660254 ]
mean value: 0.908197290053634
key: train_mcc
value: [0.99606293 0.99211042 1. 0.99211042 1. 0.99607071
0.99607071 1. 0.99607071 0.99212598]
mean value: 0.9960621896302765
key: test_accuracy
value: [0.98245614 0.94736842 0.98245614 0.9122807 0.89285714 0.94642857
0.98214286 0.98214286 0.96428571 0.92857143]
mean value: 0.9520989974937343
key: train_accuracy
value: [0.99802761 0.99605523 1. 0.99605523 1. 0.9980315
0.9980315 1. 0.9980315 0.99606299]
mean value: 0.9980295547376105
key: test_fscore
value: [0.98245614 0.94915254 0.98305085 0.92063492 0.9 0.94545455
0.98245614 0.98245614 0.96551724 0.93333333]
mean value: 0.9544511851685249
key: train_fscore
value: [0.99803536 0.99606299 1. 0.99604743 1. 0.99803536
0.99803536 1. 0.99803536 0.99606299]
mean value: 0.9980314868913049
key: test_precision
value: [0.96551724 0.90322581 0.96666667 0.85294118 0.84375 0.96296296
0.96551724 0.96551724 0.93333333 0.875 ]
mean value: 0.9234431670023096
key: train_precision
value: [0.99607843 0.99606299 1. 0.99604743 1. 0.99607843
0.99607843 1. 0.99607843 0.99606299]
mean value: 0.9972487140572204
key: test_recall
value: [1. 1. 1. 1. 0.96428571 0.92857143
1. 1. 1. 1. ]
mean value: 0.9892857142857143
key: train_recall
value: [1. 0.99606299 1. 0.99604743 1. 1.
1. 1. 1. 0.99606299]
mean value: 0.9988173415082008
key: test_roc_auc
value: [0.98275862 0.94827586 0.98214286 0.91071429 0.89285714 0.94642857
0.98214286 0.98214286 0.96428571 0.92857143]
mean value: 0.9520320197044335
key: train_roc_auc
value: [0.99802372 0.99605521 1. 0.99605521 1. 0.9980315
0.9980315 1. 0.9980315 0.99606299]
mean value: 0.9980291618686005
key: test_jcc
value: [0.96551724 0.90322581 0.96666667 0.85294118 0.81818182 0.89655172
0.96551724 0.96551724 0.93333333 0.875 ]
mean value: 0.9142452249379882
key: train_jcc
value: [0.99607843 0.99215686 1. 0.99212598 1. 0.99607843
0.99607843 1. 0.99607843 0.99215686]
mean value: 0.9960753435232361
MCC on Blind test: 0.8
Accuracy on Blind test: 0.93
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02960896 0.02699208 0.02417684 0.03163671 0.02562284 0.02379894
0.03221059 0.02416635 0.02949929 0.02938581]
mean value: 0.02770984172821045
key: score_time
value: [0.01288438 0.01086068 0.00921392 0.0142808 0.0104208 0.01039267
0.01026678 0.00972581 0.01522899 0.01317906]
mean value: 0.01164538860321045
key: test_mcc
value: [0.96551724 1. 0.96547546 0.96547546 0.89342711 0.96490128
0.96490128 0.93094934 0.89342711 0.96490128]
mean value: 0.9508975559462645
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 1. 0.98245614 0.98245614 0.94642857 0.98214286
0.98214286 0.96428571 0.94642857 0.98214286]
mean value: 0.975093984962406
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98245614 1. 0.98305085 0.98305085 0.94736842 0.98245614
0.98245614 0.96551724 0.94736842 0.98245614]
mean value: 0.9756180339803336
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96551724 1. 0.96666667 0.96666667 0.93103448 0.96551724
0.96551724 0.93333333 0.93103448 0.96551724]
mean value: 0.9590804597701149
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.96428571 1.
1. 1. 0.96428571 1. ]
mean value: 0.9928571428571429
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98275862 1. 0.98214286 0.98214286 0.94642857 0.98214286
0.98214286 0.96428571 0.94642857 0.98214286]
mean value: 0.9750615763546799
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96551724 1. 0.96666667 0.96666667 0.9 0.96551724
0.96551724 0.93333333 0.9 0.96551724]
mean value: 0.9528735632183908
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.96
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.16124463 0.13249683 0.12512231 0.123142 0.12262487 0.12115049
0.1241188 0.12215066 0.12319398 0.12175512]
mean value: 0.12769997119903564
key: score_time
value: [0.02338743 0.01843238 0.02020693 0.01953816 0.01848555 0.01992226
0.01976848 0.01945233 0.0198977 0.0199244 ]
mean value: 0.019901561737060546
key: test_mcc
value: [1. 1. 0.96547546 0.96547546 0.89342711 0.93094934
1. 0.96490128 0.89342711 0.93094934]
mean value: 0.9544605091626567
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 1. 0.98245614 0.98245614 0.94642857 0.96428571
1. 0.98214286 0.94642857 0.96428571]
mean value: 0.9768483709273182
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 1. 0.98305085 0.98305085 0.94736842 0.96296296
1. 0.98245614 0.94736842 0.96551724]
mean value: 0.9771774881713668
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.96666667 0.96666667 0.93103448 1.
1. 0.96551724 0.93103448 0.93333333]
mean value: 0.9694252873563218
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.96428571 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.9857142857142858
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 0.98214286 0.98214286 0.94642857 0.96428571
1. 0.98214286 0.94642857 0.96428571]
mean value: 0.9767857142857144
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 1. 0.96666667 0.96666667 0.9 0.92857143
1. 0.96551724 0.9 0.93333333]
mean value: 0.9560755336617406
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.6
Accuracy on Blind test: 0.88
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01061201 0.01145554 0.01236582 0.01151681 0.01169515 0.01167917
0.01173377 0.01138639 0.01171923 0.0116694 ]
mean value: 0.011583328247070312
key: score_time
value: [0.00956845 0.00910449 0.01007557 0.00983143 0.00965309 0.00978756
0.00995731 0.00933337 0.01028538 0.00958109]
mean value: 0.009717774391174317
key: test_mcc
value: [0.80817326 0.86851042 0.89952865 0.86789789 0.64116714 0.85714286
0.89802651 0.8660254 0.89342711 0.83484711]
mean value: 0.8434746352564939
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.89473684 0.92982456 0.94736842 0.92982456 0.80357143 0.92857143
0.94642857 0.92857143 0.94642857 0.91071429]
mean value: 0.9166040100250626
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90322581 0.93333333 0.95081967 0.93548387 0.83076923 0.92857143
0.94915254 0.93333333 0.94736842 0.91803279]
mean value: 0.9230090425868587
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.82352941 0.875 0.90625 0.87878788 0.72972973 0.92857143
0.90322581 0.875 0.93103448 0.84848485]
mean value: 0.8699613586548824
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.96428571 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.9857142857142858
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.89655172 0.93103448 0.94642857 0.92857143 0.80357143 0.92857143
0.94642857 0.92857143 0.94642857 0.91071429]
mean value: 0.9166871921182267
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.82352941 0.875 0.90625 0.87878788 0.71052632 0.86666667
0.90322581 0.875 0.9 0.84848485]
mean value: 0.8587470927945187
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.53
Accuracy on Blind test: 0.86
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.88762093 1.83851171 1.77748513 1.75392604 1.80487251 1.68222833
1.85387468 1.78196788 1.73018646 1.72484589]
mean value: 1.7835519552230834
key: score_time
value: [0.10137248 0.10127664 0.10406446 0.0981307 0.09335041 0.10171962
0.10252476 0.09720516 0.09547591 0.09470224]
mean value: 0.09898223876953124
key: test_mcc
value: [0.96551724 1. 0.96547546 0.96547546 0.89342711 0.93094934
1. 0.96490128 0.89342711 0.96490128]
mean value: 0.954407427810863
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 1. 0.98245614 0.98245614 0.94642857 0.96428571
1. 0.98214286 0.94642857 0.98214286]
mean value: 0.9768796992481202
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98245614 1. 0.98305085 0.98305085 0.94736842 0.96296296
1. 0.98245614 0.94736842 0.98245614]
mean value: 0.9771169921036111
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96551724 1. 0.96666667 0.96666667 0.93103448 1.
1. 0.96551724 0.93103448 0.96551724]
mean value: 0.9691954022988506
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.96428571 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.9857142857142858
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98275862 1. 0.98214286 0.98214286 0.94642857 0.96428571
1. 0.98214286 0.94642857 0.98214286]
mean value: 0.9768472906403942
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96551724 1. 0.96666667 0.96666667 0.9 0.92857143
1. 0.96551724 0.9 0.96551724]
mean value: 0.9558456486042693
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.79
Accuracy on Blind test: 0.93
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.95136046 0.98831916 0.95716357 0.93647122 0.99228835 1.02972484
0.97567821 0.97857833 0.98092842 0.9740994 ]
mean value: 0.9764611959457398
key: score_time
value: [0.24091625 0.2508409 0.16018391 0.22381854 0.2695148 0.21784067
0.22880816 0.24814248 0.26897693 0.26349545]
mean value: 0.23725380897521972
key: test_mcc
value: [0.96551724 0.93202124 0.96547546 1. 0.93094934 0.93094934
1. 0.96490128 0.93094934 0.93094934]
mean value: 0.9551712565684981
key: train_mcc
value: [0.98046604 0.9685613 0.98046755 0.97660594 0.98437404 0.97665048
0.98050495 0.98437404 0.98050495 0.98050495]
mean value: 0.979301423744519
key: test_accuracy
value: [0.98245614 0.96491228 0.98245614 1. 0.96428571 0.96428571
1. 0.98214286 0.96428571 0.96428571]
mean value: 0.9769110275689223
key: train_accuracy
value: [0.99013807 0.98422091 0.99013807 0.98816568 0.99212598 0.98818898
0.99015748 0.99212598 0.99015748 0.99015748]
mean value: 0.9895576107720263
key: test_fscore
value: [0.98245614 0.96296296 0.98305085 1. 0.96551724 0.96296296
1. 0.98245614 0.96551724 0.96551724]
mean value: 0.9770440778223238
key: train_fscore
value: [0.99025341 0.984375 0.99021526 0.98828125 0.9921875 0.98832685
0.99025341 0.9921875 0.99025341 0.99025341]
mean value: 0.9896587007661066
key: test_precision
value: [0.96551724 1. 0.96666667 1. 0.93333333 1.
1. 0.96551724 0.93333333 0.93333333]
mean value: 0.9697701149425287
key: train_precision
value: [0.98069498 0.97674419 0.98062016 0.97683398 0.98449612 0.97692308
0.98069498 0.98449612 0.98069498 0.98069498]
mean value: 0.9802893565684263
key: test_recall
value: [1. 0.92857143 1. 1. 1. 0.92857143
1. 1. 1. 1. ]
mean value: 0.9857142857142858
key: train_recall
value: [1. 0.99212598 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9992125984251968
key: test_roc_auc
value: [0.98275862 0.96428571 0.98214286 1. 0.96428571 0.96428571
1. 0.98214286 0.96428571 0.96428571]
mean value: 0.9768472906403941
key: train_roc_auc
value: [0.99011858 0.98420528 0.99015748 0.98818898 0.99212598 0.98818898
0.99015748 0.99212598 0.99015748 0.99015748]
mean value: 0.9895583704210887
key: test_jcc
value: [0.96551724 0.92857143 0.96666667 1. 0.93333333 0.92857143
1. 0.96551724 0.93333333 0.93333333]
mean value: 0.9554844006568145
key: train_jcc
value: [0.98069498 0.96923077 0.98062016 0.97683398 0.98449612 0.97692308
0.98069498 0.98449612 0.98069498 0.98069498]
mean value: 0.9795380148868521
MCC on Blind test: 0.83
Accuracy on Blind test: 0.94
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01103711 0.01071525 0.01178098 0.01100349 0.01181769 0.0120666
0.01212502 0.01112056 0.01161766 0.01164937]
mean value: 0.011493372917175292
key: score_time
value: [0.01010919 0.00939178 0.00976348 0.01021886 0.01012588 0.0094111
0.00998759 0.00937343 0.00957465 0.00951076]
mean value: 0.009746670722961426
key: test_mcc
value: [0.72242731 0.61405719 0.58076493 0.47413793 0.39310793 0.60753044
0.64951905 0.75047877 0.58501794 0.57142857]
mean value: 0.5948470056906343
key: train_mcc
value: [0.61736329 0.62938349 0.62938349 0.63709364 0.64961133 0.59933628
0.63787438 0.59872224 0.62622211 0.63009708]
mean value: 0.6255087330440435
key: test_accuracy
value: [0.84210526 0.80701754 0.78947368 0.73684211 0.69642857 0.80357143
0.82142857 0.875 0.78571429 0.78571429]
mean value: 0.794329573934837
key: train_accuracy
value: [0.8086785 0.81459566 0.81459566 0.81854043 0.82480315 0.7992126
0.81889764 0.7992126 0.81299213 0.81496063]
mean value: 0.8126488996567737
key: test_fscore
value: [0.86153846 0.8 0.78571429 0.73684211 0.70175439 0.80701754
0.83333333 0.87272727 0.76 0.78571429]
mean value: 0.7944641674115358
key: train_fscore
value: [0.8086785 0.812749 0.81640625 0.81746032 0.82445759 0.79352227
0.8203125 0.796 0.81553398 0.812749 ]
mean value: 0.8117869417892003
key: test_precision
value: [0.75675676 0.81481481 0.81481481 0.75 0.68965517 0.79310345
0.78125 0.88888889 0.86363636 0.78571429]
mean value: 0.7938634545315579
key: train_precision
value: [0.81027668 0.82258065 0.80694981 0.82071713 0.82608696 0.81666667
0.81395349 0.80894309 0.8045977 0.82258065]
mean value: 0.8153352810729207
key: test_recall
value: [1. 0.78571429 0.75862069 0.72413793 0.71428571 0.82142857
0.89285714 0.85714286 0.67857143 0.78571429]
mean value: 0.8018472906403941
key: train_recall
value: [0.80708661 0.80314961 0.82608696 0.81422925 0.82283465 0.77165354
0.82677165 0.78346457 0.82677165 0.80314961]
mean value: 0.8085198095297377
key: test_roc_auc
value: [0.84482759 0.80665025 0.79002463 0.73706897 0.69642857 0.80357143
0.82142857 0.875 0.78571429 0.78571429]
mean value: 0.7946428571428572
key: train_roc_auc
value: [0.80868165 0.81461828 0.81461828 0.81853195 0.82480315 0.7992126
0.81889764 0.7992126 0.81299213 0.81496063]
mean value: 0.8126528897326569
key: test_jcc
value: [0.75675676 0.66666667 0.64705882 0.58333333 0.54054054 0.67647059
0.71428571 0.77419355 0.61290323 0.64705882]
mean value: 0.6619268021070678
key: train_jcc
value: [0.67880795 0.68456376 0.68976898 0.69127517 0.70134228 0.65771812
0.69536424 0.66112957 0.68852459 0.68456376]
mean value: 0.6833058407846723
MCC on Blind test: 0.2
Accuracy on Blind test: 0.69
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.0972321 0.07731676 0.07649851 0.0709219 0.07835388 0.07469583
0.0731678 0.22985172 0.07314754 0.074085 ]
mean value: 0.09252710342407226
key: score_time
value: [0.01207376 0.01114559 0.01213479 0.01066709 0.01160574 0.01146913
0.01133513 0.01144457 0.01144123 0.01136351]
mean value: 0.011468052864074707
key: test_mcc
value: [0.96551724 1. 0.96547546 0.96547546 0.92857143 0.96490128
1. 0.96490128 0.93094934 0.96490128]
mean value: 0.9650692763304416
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 1. 0.98245614 0.98245614 0.96428571 0.98214286
1. 0.98214286 0.96428571 0.98214286]
mean value: 0.9822368421052632
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98245614 1. 0.98305085 0.98305085 0.96428571 0.98245614
1. 0.98245614 0.96551724 0.98245614]
mean value: 0.9825729211983787
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96551724 1. 0.96666667 0.96666667 0.96428571 0.96551724
1. 0.96551724 0.93333333 0.96551724]
mean value: 0.9693021346469622
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.96428571 1.
1. 1. 1. 1. ]
mean value: 0.9964285714285714
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98275862 1. 0.98214286 0.98214286 0.96428571 0.98214286
1. 0.98214286 0.96428571 0.98214286]
mean value: 0.9822044334975369
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96551724 1. 0.96666667 0.96666667 0.93103448 0.96551724
1. 0.96551724 0.93333333 0.96551724]
mean value: 0.9659770114942529
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.88
Accuracy on Blind test: 0.96
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.045156 0.05143094 0.04170799 0.07557249 0.04485083 0.05265903
0.04891658 0.07405901 0.04442978 0.07700086]
mean value: 0.05557835102081299
key: score_time
value: [0.01947165 0.01236486 0.01648664 0.01348376 0.02269197 0.01272631
0.02324986 0.01244068 0.01243663 0.01249099]
mean value: 0.015784335136413575
key: test_mcc
value: [0.96551724 0.85960591 0.79110556 0.89952865 0.85933785 1.
0.82618439 0.89802651 0.89802651 0.8660254 ]
mean value: 0.8863358024357655
key: train_mcc
value: [0.96055211 0.96847134 0.95661443 0.95661511 0.9645744 0.96062992
0.9645744 0.96850394 0.96062992 0.96062992]
mean value: 0.9621795507209824
key: test_accuracy
value: [0.98245614 0.92982456 0.89473684 0.94736842 0.92857143 1.
0.91071429 0.94642857 0.94642857 0.92857143]
mean value: 0.9415100250626567
key: train_accuracy
value: [0.98027613 0.98422091 0.97830375 0.97830375 0.98228346 0.98031496
0.98228346 0.98425197 0.98031496 0.98031496]
mean value: 0.9810868316016711
key: test_fscore
value: [0.98245614 0.92857143 0.9 0.95081967 0.92592593 1.
0.91525424 0.94915254 0.94915254 0.93333333]
mean value: 0.9434665822346611
key: train_fscore
value: [0.98031496 0.98431373 0.97821782 0.97830375 0.98231827 0.98031496
0.98231827 0.98425197 0.98031496 0.98031496]
mean value: 0.98109836480702
key: test_precision
value: [0.96551724 0.92857143 0.87096774 0.90625 0.96153846 1.
0.87096774 0.90322581 0.90322581 0.875 ]
mean value: 0.9185264228263395
key: train_precision
value: [0.98031496 0.98046875 0.98015873 0.97637795 0.98039216 0.98031496
0.98039216 0.98425197 0.98031496 0.98031496]
mean value: 0.9803301557663748
key: test_recall
value: [1. 0.92857143 0.93103448 1. 0.89285714 1.
0.96428571 1. 1. 1. ]
mean value: 0.9716748768472907
key: train_recall
value: [0.98031496 0.98818898 0.97628458 0.98023715 0.98425197 0.98031496
0.98425197 0.98425197 0.98031496 0.98031496]
mean value: 0.9818726463539884
key: test_roc_auc
value: [0.98275862 0.92980296 0.89408867 0.94642857 0.92857143 1.
0.91071429 0.94642857 0.94642857 0.92857143]
mean value: 0.9413793103448276
key: train_roc_auc
value: [0.98027606 0.98421307 0.97829977 0.97830755 0.98228346 0.98031496
0.98228346 0.98425197 0.98031496 0.98031496]
mean value: 0.9810860228439825
key: test_jcc
value: [0.96551724 0.86666667 0.81818182 0.90625 0.86206897 1.
0.84375 0.90322581 0.90322581 0.875 ]
mean value: 0.8943886304648262
key: train_jcc
value: [0.96138996 0.96911197 0.95736434 0.95752896 0.96525097 0.96138996
0.96525097 0.96899225 0.96138996 0.96138996]
mean value: 0.962905929184999
MCC on Blind test: 0.61
Accuracy on Blind test: 0.86
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01605272 0.01064062 0.01053286 0.01062799 0.01046133 0.01057339
0.01031685 0.01173782 0.01179981 0.01043272]
mean value: 0.011317610740661621
key: score_time
value: [0.01308727 0.0094192 0.00935388 0.00926232 0.0092082 0.00946975
0.00921726 0.00949192 0.00977159 0.00906062]
mean value: 0.009734201431274413
key: test_mcc
value: [0.70694956 0.79682005 0.61405719 0.54433498 0.35805744 0.57735027
0.61065803 0.4645821 0.61706091 0.61065803]
mean value: 0.59005285401838
key: train_mcc
value: [0.65069271 0.60967718 0.61360065 0.56269586 0.67365136 0.57949966
0.61061966 0.63009708 0.67887215 0.56699945]
mean value: 0.6176405758990616
key: test_accuracy
value: [0.84210526 0.89473684 0.80701754 0.77192982 0.67857143 0.78571429
0.80357143 0.73214286 0.80357143 0.80357143]
mean value: 0.7922932330827067
key: train_accuracy
value: [0.82445759 0.80473373 0.80670611 0.78106509 0.83661417 0.78937008
0.80511811 0.81496063 0.83858268 0.78346457]
mean value: 0.8085072760875305
key: test_fscore
value: [0.85714286 0.88461538 0.81355932 0.77192982 0.68965517 0.8
0.81355932 0.73684211 0.78431373 0.81355932]
mean value: 0.7965177035588487
key: train_fscore
value: [0.83111954 0.80776699 0.80859375 0.78529981 0.83945841 0.79462572
0.80851064 0.81712062 0.84410646 0.78515625]
mean value: 0.812175819990016
key: test_precision
value: [0.77142857 0.95833333 0.8 0.78571429 0.66666667 0.75
0.77419355 0.72413793 0.86956522 0.77419355]
mean value: 0.7874233102342838
key: train_precision
value: [0.8021978 0.79693487 0.7992278 0.76893939 0.82509506 0.7752809
0.79467681 0.80769231 0.81617647 0.77906977]
mean value: 0.7965291168982057
key: test_recall
value: [0.96428571 0.82142857 0.82758621 0.75862069 0.71428571 0.85714286
0.85714286 0.75 0.71428571 0.85714286]
mean value: 0.812192118226601
key: train_recall
value: [0.86220472 0.81889764 0.81818182 0.80237154 0.85433071 0.81496063
0.82283465 0.82677165 0.87401575 0.79133858]
mean value: 0.8285907690392456
key: test_roc_auc
value: [0.84421182 0.89347291 0.80665025 0.77216749 0.67857143 0.78571429
0.80357143 0.73214286 0.80357143 0.80357143]
mean value: 0.7923645320197044
key: train_roc_auc
value: [0.82438299 0.80470574 0.8067287 0.78110703 0.83661417 0.78937008
0.80511811 0.81496063 0.83858268 0.78346457]
mean value: 0.8085034701689957
key: test_jcc
value: [0.75 0.79310345 0.68571429 0.62857143 0.52631579 0.66666667
0.68571429 0.58333333 0.64516129 0.68571429]
mean value: 0.6650294813786413
key: train_jcc
value: [0.71103896 0.67752443 0.67868852 0.64649682 0.72333333 0.65923567
0.67857143 0.69078947 0.73026316 0.64630225]
mean value: 0.6842244043960553
MCC on Blind test: 0.59
Accuracy on Blind test: 0.83
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01944375 0.02495122 0.02653742 0.0216639 0.03189659 0.02602649
0.02330875 0.02988434 0.02322054 0.03054476]
mean value: 0.02574777603149414
key: score_time
value: [0.01058817 0.01182628 0.01213336 0.01234889 0.01249361 0.01251531
0.012465 0.01516986 0.0150609 0.02132058]
mean value: 0.013592195510864259
key: test_mcc
value: [0.89988258 0.8951918 0.7366424 0.89952865 0.78772636 0.89342711
0.52223297 0.92857143 0.70082556 0.8660254 ]
mean value: 0.8130054254678469
key: train_mcc
value: [0.92712676 0.95292731 0.93792915 0.90342654 0.96074906 0.95670033
0.6780635 0.97250878 0.89014893 0.96853396]
mean value: 0.9148114321896691
key: test_accuracy
value: [0.94736842 0.94736842 0.85964912 0.94736842 0.89285714 0.94642857
0.71428571 0.96428571 0.83928571 0.92857143]
mean value: 0.8987468671679197
key: train_accuracy
value: [0.96252465 0.97633136 0.96844181 0.95069034 0.98031496 0.97834646
0.81496063 0.98622047 0.94291339 0.98425197]
mean value: 0.9544996039696222
key: test_fscore
value: [0.94915254 0.94545455 0.84615385 0.95081967 0.89655172 0.94545455
0.77777778 0.96428571 0.81632653 0.93333333]
mean value: 0.9025310231713968
key: train_fscore
value: [0.96380952 0.9766537 0.96761134 0.95219885 0.98046875 0.978389
0.84385382 0.98613861 0.93995859 0.98418972]
mean value: 0.9573271907059853
key: test_precision
value: [0.90322581 0.96296296 0.95652174 0.90625 0.86666667 0.96296296
0.63636364 0.96428571 0.95238095 0.875 ]
mean value: 0.8986620441204943
key: train_precision
value: [0.93357934 0.96538462 0.99170124 0.92222222 0.97286822 0.97647059
0.72988506 0.99203187 0.99126638 0.98809524]
mean value: 0.9463504767125346
key: test_recall
value: [1. 0.92857143 0.75862069 1. 0.92857143 0.92857143
1. 0.96428571 0.71428571 1. ]
mean value: 0.9222906403940887
key: train_recall
value: [0.99606299 0.98818898 0.94466403 0.98418972 0.98818898 0.98031496
1. 0.98031496 0.89370079 0.98031496]
mean value: 0.973594036911394
key: test_roc_auc
value: [0.94827586 0.94704433 0.8614532 0.94642857 0.89285714 0.94642857
0.71428571 0.96428571 0.83928571 0.92857143]
mean value: 0.8988916256157635
key: train_roc_auc
value: [0.96245837 0.97630793 0.96839501 0.95075628 0.98031496 0.97834646
0.81496063 0.98622047 0.94291339 0.98425197]
mean value: 0.9544925461392425
key: test_jcc
value: [0.90322581 0.89655172 0.73333333 0.90625 0.8125 0.89655172
0.63636364 0.93103448 0.68965517 0.875 ]
mean value: 0.8280465879596859
key: train_jcc
value: [0.93014706 0.95437262 0.9372549 0.90875912 0.96168582 0.95769231
0.72988506 0.97265625 0.88671875 0.9688716 ]
mean value: 0.920804349269515
MCC on Blind test: 0.76
Accuracy on Blind test: 0.91
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02200437 0.02010751 0.01893258 0.01999164 0.02031374 0.01846647
0.02055311 0.02092481 0.01913881 0.02208757]
mean value: 0.020252060890197755
key: score_time
value: [0.01222563 0.01219106 0.01215363 0.0128088 0.01377892 0.01212907
0.01220179 0.0122571 0.01221776 0.01227355]
mean value: 0.01242372989654541
key: test_mcc
value: [0.96551724 0.86789789 0.83703659 0.74822828 0.72168784 0.71611487
0.72168784 0.96490128 0.73127242 0.89802651]
mean value: 0.8172370767372027
key: train_mcc
value: [0.77941536 0.90393669 0.89862256 0.80222203 0.96463421 0.86255889
0.91852667 0.9645744 0.84762399 0.96137528]
mean value: 0.8903490079543654
key: test_accuracy
value: [0.98245614 0.92982456 0.9122807 0.85964912 0.85714286 0.85714286
0.85714286 0.98214286 0.85714286 0.94642857]
mean value: 0.9041353383458646
key: train_accuracy
value: [0.87968442 0.95069034 0.94674556 0.89151874 0.98228346 0.92716535
0.95866142 0.98228346 0.91929134 0.98031496]
mean value: 0.9418639053254438
key: test_fscore
value: [0.98245614 0.92307692 0.92063492 0.87878788 0.86666667 0.86206897
0.86666667 0.98245614 0.84 0.94915254]
mean value: 0.9071966844424932
key: train_fscore
value: [0.86474501 0.94887526 0.94934334 0.90196078 0.98238748 0.93186004
0.9596929 0.98231827 0.91295117 0.98069498]
mean value: 0.941482922079735
key: test_precision
value: [0.96551724 1. 0.85294118 0.78378378 0.8125 0.83333333
0.8125 0.96551724 0.95454545 0.90322581]
mean value: 0.8883864037343394
key: train_precision
value: [0.98984772 0.98723404 0.90357143 0.82142857 0.9766537 0.87543253
0.93632959 0.98039216 0.99078341 0.96212121]
mean value: 0.9423794347876031
key: test_recall
value: [1. 0.85714286 1. 1. 0.92857143 0.89285714
0.92857143 1. 0.75 1. ]
mean value: 0.9357142857142857
key: train_recall
value: [0.76771654 0.91338583 1. 1. 0.98818898 0.99606299
0.98425197 0.98425197 0.84645669 1. ]
mean value: 0.9480314960629921
key: test_roc_auc
value: [0.98275862 0.92857143 0.91071429 0.85714286 0.85714286 0.85714286
0.85714286 0.98214286 0.85714286 0.94642857]
mean value: 0.9036330049261084
key: train_roc_auc
value: [0.8799057 0.95076406 0.94685039 0.89173228 0.98228346 0.92716535
0.95866142 0.98228346 0.91929134 0.98031496]
mean value: 0.9419252435342815
key: test_jcc
value: [0.96551724 0.85714286 0.85294118 0.78378378 0.76470588 0.75757576
0.76470588 0.96551724 0.72413793 0.90322581]
mean value: 0.8339253559923585
key: train_jcc
value: [0.76171875 0.90272374 0.90357143 0.82142857 0.96538462 0.87241379
0.92250923 0.96525097 0.83984375 0.96212121]
mean value: 0.8916966046361052
MCC on Blind test: 0.71
Accuracy on Blind test: 0.89
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.18998051 0.18064475 0.18306255 0.18004155 0.18018317 0.17893577
0.17555499 0.179106 0.17361403 0.17628264]
mean value: 0.1797405958175659
key: score_time
value: [0.01593828 0.01705146 0.01724339 0.01699996 0.01718593 0.01622796
0.01678467 0.01644039 0.01647544 0.01549101]
mean value: 0.016583847999572753
key: test_mcc
value: [0.96551724 1. 0.96547546 0.96547546 0.96490128 0.96490128
1. 0.96490128 0.93094934 0.96490128]
mean value: 0.9687022616087002
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 1. 0.98245614 0.98245614 0.98214286 0.98214286
1. 0.98214286 0.96428571 0.98214286]
mean value: 0.9840225563909775
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98245614 1. 0.98305085 0.98305085 0.98245614 0.98245614
1. 0.98245614 0.96551724 0.98245614]
mean value: 0.984389963804895
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96551724 1. 0.96666667 0.96666667 0.96551724 0.96551724
1. 0.96551724 0.93333333 0.96551724]
mean value: 0.9694252873563218
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98275862 1. 0.98214286 0.98214286 0.98214286 0.98214286
1. 0.98214286 0.96428571 0.98214286]
mean value: 0.9839901477832513
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96551724 1. 0.96666667 0.96666667 0.96551724 0.96551724
1. 0.96551724 0.93333333 0.96551724]
mean value: 0.9694252873563218
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.88
Accuracy on Blind test: 0.96
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.06405163 0.07544374 0.07800102 0.0659802 0.08432174 0.09123611
0.07245827 0.06432152 0.07139039 0.0639205 ]
mean value: 0.07311251163482665
key: score_time
value: [0.02064395 0.02726412 0.02178264 0.02308631 0.03058195 0.0379889
0.01936054 0.02460885 0.0287199 0.02765727]
mean value: 0.026169443130493165
key: test_mcc
value: [0.96551724 0.93202124 0.96547546 1. 0.89342711 0.96490128
0.92857143 0.96490128 0.93094934 0.93094934]
mean value: 0.9476713715472729
key: train_mcc
value: [1. 0.99211042 1. 0.99214142 1. 1.
0.99212598 1. 0.99607071 0.99215674]
mean value: 0.996460528395497
key: test_accuracy
value: [0.98245614 0.96491228 0.98245614 1. 0.94642857 0.98214286
0.96428571 0.98214286 0.96428571 0.96428571]
mean value: 0.9733395989974937
key: train_accuracy
value: [1. 0.99605523 1. 0.99605523 1. 1.
0.99606299 1. 0.9980315 0.99606299]
mean value: 0.9982267933963875
key: test_fscore
value: [0.98245614 0.96296296 0.98305085 1. 0.94736842 0.98245614
0.96428571 0.98245614 0.96551724 0.96551724]
mean value: 0.9736070849570189
key: train_fscore
value: [1. 0.99606299 1. 0.99606299 1. 1.
0.99606299 1. 0.99803536 0.99607843]
mean value: 0.9982302771208262
key: test_precision
value: [0.96551724 1. 0.96666667 1. 0.93103448 0.96551724
0.96428571 0.96551724 0.93333333 0.93333333]
mean value: 0.96252052545156
key: train_precision
value: [1. 0.99606299 1. 0.99215686 1. 1.
0.99606299 1. 0.99607843 0.9921875 ]
mean value: 0.9972548778369615
key: test_recall
value: [1. 0.92857143 1. 1. 0.96428571 1.
0.96428571 1. 1. 1. ]
mean value: 0.9857142857142858
key: train_recall
value: [1. 0.99606299 1. 1. 1. 1.
0.99606299 1. 1. 1. ]
mean value: 0.9992125984251968
key: test_roc_auc
value: [0.98275862 0.96428571 0.98214286 1. 0.94642857 0.98214286
0.96428571 0.98214286 0.96428571 0.96428571]
mean value: 0.9732758620689655
key: train_roc_auc
value: [1. 0.99605521 1. 0.99606299 1. 1.
0.99606299 1. 0.9980315 0.99606299]
mean value: 0.9982275683918956
key: test_jcc
value: [0.96551724 0.92857143 0.96666667 1. 0.9 0.96551724
0.93103448 0.96551724 0.93333333 0.93333333]
mean value: 0.9489490968801314
key: train_jcc
value: [1. 0.99215686 1. 0.99215686 1. 1.
0.99215686 1. 0.99607843 0.9921875 ]
mean value: 0.9964736519607843
MCC on Blind test: 0.85
Accuracy on Blind test: 0.94
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.17205048 0.19767261 0.18131852 0.16802168 0.2220397 0.16332126
0.26538157 0.2428298 0.2475462 0.21324587]
mean value: 0.2073427677154541
key: score_time
value: [0.03067493 0.02527881 0.0278089 0.01557803 0.02717137 0.0271771
0.02811146 0.03504801 0.04026318 0.02716613]
mean value: 0.028427791595458985
key: test_mcc
value: [0.77903565 0.8953202 0.96547546 0.80685836 0.76225171 0.85714286
0.8660254 0.93094934 0.89342711 0.77459667]
mean value: 0.8531082753337808
key: train_mcc
value: [0.98434291 0.98823457 0.98434388 0.98823511 0.99215674 0.99215674
0.98437404 0.98437404 0.99215674 0.98437404]
mean value: 0.9874748809761109
key: test_accuracy
value: [0.87719298 0.94736842 0.98245614 0.89473684 0.875 0.92857143
0.92857143 0.96428571 0.94642857 0.875 ]
mean value: 0.9219611528822055
key: train_accuracy
value: [0.99211045 0.99408284 0.99211045 0.99408284 0.99606299 0.99606299
0.99212598 0.99212598 0.99606299 0.99212598]
mean value: 0.9936953516905062
key: test_fscore
value: [0.88888889 0.94736842 0.98305085 0.90625 0.8852459 0.92857143
0.93333333 0.96551724 0.94736842 0.88888889]
mean value: 0.9274483372264085
key: train_fscore
value: [0.9921875 0.99412916 0.99215686 0.99410609 0.99607843 0.99607843
0.9921875 0.9921875 0.99607843 0.9921875 ]
mean value: 0.9937377405748746
key: test_precision
value: [0.8 0.93103448 0.96666667 0.82857143 0.81818182 0.92857143
0.875 0.93333333 0.93103448 0.8 ]
mean value: 0.8812393640841917
key: train_precision
value: [0.98449612 0.98832685 0.9844358 0.98828125 0.9921875 0.9921875
0.98449612 0.98449612 0.9921875 0.98449612]
mean value: 0.9875590892038428
key: test_recall
value: [1. 0.96428571 1. 1. 0.96428571 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.9821428571428572
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.87931034 0.9476601 0.98214286 0.89285714 0.875 0.92857143
0.92857143 0.96428571 0.94642857 0.875 ]
mean value: 0.9219827586206897
key: train_roc_auc
value: [0.99209486 0.99407115 0.99212598 0.99409449 0.99606299 0.99606299
0.99212598 0.99212598 0.99606299 0.99212598]
mean value: 0.9936953409479942
key: test_jcc
value: [0.8 0.9 0.96666667 0.82857143 0.79411765 0.86666667
0.875 0.93333333 0.9 0.8 ]
mean value: 0.8664355742296919
key: train_jcc
value: [0.98449612 0.98832685 0.9844358 0.98828125 0.9921875 0.9921875
0.98449612 0.98449612 0.9921875 0.98449612]
mean value: 0.9875590892038428
MCC on Blind test: 0.38
Accuracy on Blind test: 0.8
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.66994047 0.66400075 0.67747116 0.74977636 0.67632365 0.645926
0.66423178 0.66457534 0.69733119 0.67097402]
mean value: 0.6780550718307495
key: score_time
value: [0.00979495 0.01027083 0.01547742 0.00971317 0.0095067 0.0092721
0.01001692 0.0094285 0.01009512 0.00964975]
mean value: 0.01032254695892334
key: test_mcc
value: [0.96551724 1. 0.96547546 1. 0.93094934 0.96490128
0.96490128 0.96490128 0.93094934 0.89802651]
mean value: 0.958562172459794
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 1. 0.98245614 1. 0.96428571 0.98214286
0.98214286 0.98214286 0.96428571 0.94642857]
mean value: 0.9786340852130325
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98245614 1. 0.98305085 1. 0.96551724 0.98245614
0.98245614 0.98245614 0.96551724 0.94915254]
mean value: 0.9793062433992638
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96551724 1. 0.96666667 1. 0.93333333 0.96551724
0.96551724 0.96551724 0.93333333 0.90322581]
mean value: 0.9598628105302188
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98275862 1. 0.98214286 1. 0.96428571 0.98214286
0.98214286 0.98214286 0.96428571 0.94642857]
mean value: 0.9786330049261084
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96551724 1. 0.96666667 1. 0.93333333 0.96551724
0.96551724 0.96551724 0.93333333 0.90322581]
mean value: 0.9598628105302188
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.96
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.0364995 0.05236554 0.06891942 0.04245543 0.03577256 0.04559731
0.04729986 0.04501486 0.05125642 0.03263569]
mean value: 0.04578166007995606
key: score_time
value: [0.02079725 0.01780081 0.01469088 0.01432419 0.01472092 0.01861978
0.02223301 0.02089095 0.01528358 0.01549315]
mean value: 0.017485451698303223
key: test_mcc
value: [0.9321832 0.96547546 1. 0.96547546 0.96490128 0.93094934
1. 1. 0.96490128 0.96490128]
mean value: 0.9688787292752474
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.98245614 1. 0.98245614 0.98214286 0.96428571
1. 1. 0.98214286 0.98214286]
mean value: 0.9840538847117795
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96551724 0.98181818 1. 0.98305085 0.98181818 0.96296296
1. 1. 0.98181818 0.98245614]
mean value: 0.9839441737605323
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.93333333 1. 1. 0.96666667 1. 1.
1. 1. 1. 0.96551724]
mean value: 0.986551724137931
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96428571 1. 1. 0.96428571 0.92857143
1. 1. 0.96428571 1. ]
mean value: 0.9821428571428572
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96551724 0.98214286 1. 0.98214286 0.98214286 0.96428571
1. 1. 0.98214286 0.98214286]
mean value: 0.9840517241379311
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93333333 0.96428571 1. 0.96666667 0.96428571 0.92857143
1. 1. 0.96428571 0.96551724]
mean value: 0.9686945812807882
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.05
Accuracy on Blind test: 0.78
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.03992987 0.04090619 0.02985525 0.03153992 0.01640797 0.01648879
0.02689624 0.02909088 0.0170362 0.01887155]
mean value: 0.026702284812927246
key: score_time
value: [0.02908468 0.02943969 0.02950621 0.03354526 0.01250839 0.01258111
0.02266932 0.02381325 0.01700115 0.0155673 ]
mean value: 0.022571635246276856
key: test_mcc
value: [0.89988258 0.8615634 0.82512315 0.93202124 0.82195294 0.96490128
0.92857143 0.89342711 0.82195294 0.83484711]
mean value: 0.8784243193263463
key: train_mcc
value: [0.96450468 0.95667331 0.94872473 0.95661511 0.96853396 0.95278544
0.9606597 0.96062992 0.95670033 0.95670033]
mean value: 0.9582527517536223
key: test_accuracy
value: [0.94736842 0.92982456 0.9122807 0.96491228 0.91071429 0.98214286
0.96428571 0.94642857 0.91071429 0.91071429]
mean value: 0.937938596491228
key: train_accuracy
value: [0.98224852 0.97830375 0.97435897 0.97830375 0.98425197 0.97637795
0.98031496 0.98031496 0.97834646 0.97834646]
mean value: 0.9791167746043579
key: test_fscore
value: [0.94915254 0.92592593 0.9122807 0.96666667 0.9122807 0.98245614
0.96428571 0.94736842 0.90909091 0.91803279]
mean value: 0.9387540510139624
key: train_fscore
value: [0.98224852 0.97847358 0.97425743 0.97830375 0.98431373 0.97647059
0.98039216 0.98031496 0.97830375 0.978389 ]
mean value: 0.9791467451988494
key: test_precision
value: [0.90322581 0.96153846 0.92857143 0.93548387 0.89655172 0.96551724
0.96428571 0.93103448 0.92592593 0.84848485]
mean value: 0.9260619504501596
key: train_precision
value: [0.98418972 0.97276265 0.97619048 0.97637795 0.98046875 0.97265625
0.9765625 0.98031496 0.98023715 0.97647059]
mean value: 0.9776231001196349
key: test_recall
value: [1. 0.89285714 0.89655172 1. 0.92857143 1.
0.96428571 0.96428571 0.89285714 1. ]
mean value: 0.9539408866995074
key: train_recall
value: [0.98031496 0.98425197 0.97233202 0.98023715 0.98818898 0.98031496
0.98425197 0.98031496 0.97637795 0.98031496]
mean value: 0.9806899878621892
key: test_roc_auc
value: [0.94827586 0.92918719 0.91256158 0.96428571 0.91071429 0.98214286
0.96428571 0.94642857 0.91071429 0.91071429]
mean value: 0.9379310344827587
key: train_roc_auc
value: [0.98225234 0.97829199 0.97435498 0.97830755 0.98425197 0.97637795
0.98031496 0.98031496 0.97834646 0.97834646]
mean value: 0.9791159627773801
key: test_jcc
value: [0.90322581 0.86206897 0.83870968 0.93548387 0.83870968 0.96551724
0.93103448 0.9 0.83333333 0.84848485]
mean value: 0.8856567903731418
key: train_jcc
value: [0.96511628 0.95785441 0.94980695 0.95752896 0.96911197 0.95402299
0.96153846 0.96138996 0.95752896 0.95769231]
mean value: 0.9591591238303347
MCC on Blind test: 0.76
Accuracy on Blind test: 0.91
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.45232368 0.31628871 0.2526288 0.39126348 0.50805736 0.3298955
0.37645435 0.22237539 0.32188296 0.34562302]
mean value: 0.35167932510375977
key: score_time
value: [0.01331973 0.02620149 0.01252651 0.02661324 0.02001739 0.02062368
0.02325225 0.03817391 0.01921773 0.02543306]
mean value: 0.022537899017333985
key: test_mcc
value: [0.89988258 0.8615634 0.82512315 0.93202124 0.82195294 0.96490128
0.92857143 0.89342711 0.82195294 0.83484711]
mean value: 0.8784243193263463
key: train_mcc
value: [0.96450468 0.95667331 0.94872473 0.95661511 0.96853396 0.95278544
0.9606597 0.96062992 0.95670033 0.95670033]
mean value: 0.9582527517536223
key: test_accuracy
value: [0.94736842 0.92982456 0.9122807 0.96491228 0.91071429 0.98214286
0.96428571 0.94642857 0.91071429 0.91071429]
mean value: 0.937938596491228
key: train_accuracy
value: [0.98224852 0.97830375 0.97435897 0.97830375 0.98425197 0.97637795
0.98031496 0.98031496 0.97834646 0.97834646]
mean value: 0.9791167746043579
key: test_fscore
value: [0.94915254 0.92592593 0.9122807 0.96666667 0.9122807 0.98245614
0.96428571 0.94736842 0.90909091 0.91803279]
mean value: 0.9387540510139624
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:188: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.98224852 0.97847358 0.97425743 0.97830375 0.98431373 0.97647059
0.98039216 0.98031496 0.97830375 0.978389 ]
mean value: 0.9791467451988494
key: test_precision
value: [0.90322581 0.96153846 0.92857143 0.93548387 0.89655172 0.96551724
0.96428571 0.93103448 0.92592593 0.84848485]
mean value: 0.9260619504501596
key: train_precision
value: [0.98418972 0.97276265 0.97619048 0.97637795 0.98046875 0.97265625
0.9765625 0.98031496 0.98023715 0.97647059]
mean value: 0.9776231001196349
key: test_recall
value: [1. 0.89285714 0.89655172 1. 0.92857143 1.
0.96428571 0.96428571 0.89285714 1. ]
mean value: 0.9539408866995074
key: train_recall
value: [0.98031496 0.98425197 0.97233202 0.98023715 0.98818898 0.98031496
0.98425197 0.98031496 0.97637795 0.98031496]
mean value: 0.9806899878621892
key: test_roc_auc
value: [0.94827586 0.92918719 0.91256158 0.96428571 0.91071429 0.98214286
0.96428571 0.94642857 0.91071429 0.91071429]
mean value: 0.9379310344827587
key: train_roc_auc
value: [0.98225234 0.97829199 0.97435498 0.97830755 0.98425197 0.97637795
0.98031496 0.98031496 0.97834646 0.97834646]
mean value: 0.9791159627773801
key: test_jcc
value: [0.90322581 0.86206897 0.83870968 0.93548387 0.83870968 0.96551724
0.93103448 0.9 0.83333333 0.84848485]
mean value: 0.8856567903731418
key: train_jcc
value: [0.96511628 0.95785441 0.94980695 0.95752896 0.96911197 0.95402299
0.96153846 0.96138996 0.95752896 0.95769231]
mean value: 0.9591591238303347
MCC on Blind test: 0.76
Accuracy on Blind test: 0.91