LSHTM_analysis/scripts/ml/log_pnca_sl.txt

19294 lines
932 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_sl.py:549: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 424
PASS: my_features_df and aa_df successfully combined
nrows: 424
ncols: 265
count of NULL values before imputation
or_mychisq 102
log10_or_mychisq 102
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 166
No. of categorical features: 7
PASS: x_features has no target variable
No. of columns for x_features: 173
-------------------------------------------------------------
Successfully split data according to scaling law: 1/np.sqrt(x_ncols)
Train data size: (170, 173)
Test data size: 0.07602859212697055 (15, 173)
y_train numbers: Counter({1: 105, 0: 65})
y_train ratio: 0.6190476190476191
y_test_numbers: Counter({1: 9, 0: 6})
y_test ratio: 0.6666666666666666
-------------------------------------------------------------
Simple Random OverSampling
Counter({0: 105, 1: 105})
(210, 173)
Simple Random UnderSampling
Counter({0: 65, 1: 65})
(130, 173)
Simple Combined Over and UnderSampling
Counter({0: 105, 1: 105})
(210, 173)
SMOTE_NC OverSampling
Counter({0: 105, 1: 105})
(210, 173)
#####################################################################
Running ML analysis: scaling law split
Gene name: pncA
Drug name: pyrazinamide
Output directory: /home/tanu/git/Data/pyrazinamide/output/ml/tts_sl/
Sanity checks:
ML source data size: (185, 173)
Total input features: (170, 173)
Target feature numbers: Counter({1: 105, 0: 65})
Target features ratio: 0.6190476190476191
#####################################################################
================================================================
Strucutral features (n): 34
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.030159 0.03234315 0.06500244 0.0333209 0.03272486 0.04780293
0.03167248 0.15353703 0.06054592 0.03304076]
mean value: 0.052014946937561035
key: score_time
value: [0.01172686 0.01196408 0.02330327 0.01240492 0.01344848 0.01356339
0.01245809 0.01469016 0.01229882 0.01356554]
mean value: 0.013942360877990723
key: test_mcc
value: [0.63262663 0.66299354 0.77151675 0.38122129 0.66299354 0.63262663
0.04351941 0.33371191 0.60385964 0.30389487]
mean value: 0.5028964212212766
key: train_mcc
value: [0.83628052 0.87638923 0.80724696 0.80552514 0.7938003 0.76492233
0.79554375 0.82448293 0.83762196 0.79554375]
mean value: 0.8137356868892114
key: test_accuracy
value: [0.82352941 0.82352941 0.88235294 0.70588235 0.82352941 0.82352941
0.52941176 0.70588235 0.82352941 0.70588235]
mean value: 0.7647058823529411
key: train_accuracy
value: [0.92156863 0.94117647 0.90849673 0.90849673 0.90196078 0.88888889
0.90196078 0.91503268 0.92156863 0.90196078]
mean value: 0.9111111111111111
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
key: test_fscore
value: [0.85714286 0.86956522 0.90909091 0.7826087 0.86956522 0.85714286
0.6 0.7826087 0.86956522 0.8 ]
mean value: 0.8197289666854884
key: train_fscore
value: [0.94 0.95431472 0.93 0.92929293 0.92537313 0.91370558
0.92462312 0.93467337 0.93939394 0.92462312]
mean value: 0.9315999905573705
key: test_precision
value: [0.81818182 0.76923077 0.83333333 0.69230769 0.76923077 0.9
0.66666667 0.75 0.83333333 0.71428571]
mean value: 0.7746570096570097
key: train_precision
value: [0.8952381 0.92156863 0.88571429 0.89320388 0.87735849 0.87378641
0.87619048 0.88571429 0.89423077 0.87619048]
mean value: 0.8879195797557542
key: test_recall
value: [0.9 1. 1. 0.9 1. 0.81818182
0.54545455 0.81818182 0.90909091 0.90909091]
mean value: 0.88
key: train_recall
value: [0.98947368 0.98947368 0.97894737 0.96842105 0.97894737 0.95744681
0.9787234 0.9893617 0.9893617 0.9787234 ]
mean value: 0.9798880179171333
key: test_roc_auc
value: [0.80714286 0.78571429 0.85714286 0.66428571 0.78571429 0.82575758
0.52272727 0.65909091 0.78787879 0.62121212]
mean value: 0.7316666666666667
key: train_roc_auc
value: [0.89990926 0.92577132 0.88602541 0.88938294 0.87740472 0.86855391
0.87919221 0.89298594 0.90146051 0.87919221]
mean value: 0.8899878429737624
key: test_jcc
value: [0.75 0.76923077 0.83333333 0.64285714 0.76923077 0.75
0.42857143 0.64285714 0.76923077 0.66666667]
mean value: 0.7021978021978023
key: train_jcc
value: [0.88679245 0.91262136 0.86915888 0.86792453 0.86111111 0.8411215
0.85981308 0.87735849 0.88571429 0.85981308]
mean value: 0.8721428769802886
MCC on Blind test: 0.61
Accuracy on Blind test: 0.8
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.90216947 0.87260532 1.08087659 0.74459434 1.0994246 1.10648322
1.21706295 0.96376276 0.86980605 0.99782419]
mean value: 0.9854609489440918
key: score_time
value: [0.01360488 0.01396418 0.01334357 0.01384592 0.01390672 0.01344657
0.01328254 0.0135169 0.01473761 0.01464629]
mean value: 0.013829517364501952
key: test_mcc
value: [0.51428571 0.50920105 0.30988989 0.51428571 0.50920105 0.88273483
0.2030906 0.48484848 0.74242424 0.48484848]
mean value: 0.5154810071962633
key: train_mcc
value: [1. 1. 1. 1. 0.91830889 0.98625704
0.90411865 1. 0.89069566 1. ]
mean value: 0.96993802415093
key: test_accuracy
value: [0.76470588 0.76470588 0.64705882 0.76470588 0.76470588 0.94117647
0.58823529 0.76470588 0.88235294 0.76470588]
mean value: 0.7647058823529411
key: train_accuracy
value: [1. 1. 1. 1. 0.96078431 0.99346405
0.95424837 1. 0.94771242 1. ]
mean value: 0.9856209150326798
key: test_fscore
value: [0.8 0.81818182 0.66666667 0.8 0.81818182 0.95238095
0.63157895 0.81818182 0.90909091 0.81818182]
mean value: 0.8032444748234222
key: train_fscore
value: [1. 1. 1. 1. 0.96938776 0.99470899
0.96373057 1. 0.95876289 1. ]
mean value: 0.988659020635716
key: test_precision
value: [0.8 0.75 0.75 0.8 0.75 1.
0.75 0.81818182 0.90909091 0.81818182]
mean value: 0.8145454545454546
key: train_precision
value: [1. 1. 1. 1. 0.94059406 0.98947368
0.93939394 1. 0.93 1. ]
mean value: 0.9799461683010406
key: test_recall
value: [0.8 0.9 0.6 0.8 0.9 0.90909091
0.54545455 0.81818182 0.90909091 0.81818182]
mean value: 0.8
key: train_recall
value: [1. 1. 1. 1. 1. 1. 0.9893617
1. 0.9893617 1. ]
mean value: 0.997872340425532
key: test_roc_auc
value: [0.75714286 0.73571429 0.65714286 0.75714286 0.73571429 0.95454545
0.60606061 0.74242424 0.87121212 0.74242424]
mean value: 0.755952380952381
key: train_roc_auc
value: [1. 1. 1. 1. 0.94827586 0.99152542
0.94383339 1. 0.93535882 1. ]
mean value: 0.9818993496400015
key: test_jcc
value: [0.66666667 0.69230769 0.5 0.66666667 0.69230769 0.90909091
0.46153846 0.69230769 0.83333333 0.69230769]
mean value: 0.6806526806526807
key: train_jcc
value: [1. 1. 1. 1. 0.94059406 0.98947368
0.93 1. 0.92079208 1. ]
mean value: 0.9780859822824388
MCC on Blind test: 0.58
Accuracy on Blind test: 0.8
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01363158 0.01329041 0.00920653 0.01015902 0.00940776 0.00897503
0.00879431 0.00902987 0.01002741 0.00941849]
mean value: 0.010194039344787598
key: score_time
value: [0.01313639 0.01162434 0.00912452 0.01039958 0.00888896 0.00871277
0.00869632 0.00936913 0.00976944 0.00974512]
mean value: 0.009946656227111817
key: test_mcc
value: [ 0.38122129 0.50920105 0.77151675 0.13241022 0.24688536 -0.01899343
0.22727273 -0.01899343 0.17069719 0.17069719]
mean value: 0.25719149163785127
key: train_mcc
value: [0.5048764 0.47629849 0.5048764 0.45884418 0.48537027 0.43135777
0.49226514 0.47721276 0.44691625 0.39720759]
mean value: 0.46752252677089945
key: test_accuracy
value: [0.70588235 0.76470588 0.88235294 0.58823529 0.64705882 0.58823529
0.64705882 0.58823529 0.64705882 0.64705882]
mean value: 0.6705882352941177
key: train_accuracy
value: [0.77124183 0.75816993 0.77124183 0.75163399 0.76470588 0.71895425
0.76470588 0.75816993 0.74509804 0.68627451]
mean value: 0.7490196078431373
key: test_fscore
value: [0.7826087 0.81818182 0.90909091 0.66666667 0.72727273 0.72
0.72727273 0.72 0.75 0.75 ]
mean value: 0.7571093544137023
key: train_fscore
value: [0.82233503 0.81218274 0.82233503 0.81 0.82524272 0.81385281
0.81818182 0.81407035 0.80597015 0.71084337]
mean value: 0.8055014016865908
key: test_precision
value: [0.69230769 0.75 0.83333333 0.63636364 0.66666667 0.64285714
0.72727273 0.64285714 0.69230769 0.69230769]
mean value: 0.6976273726273726
key: train_precision
value: [0.79411765 0.78431373 0.79411765 0.77142857 0.76576577 0.68613139
0.77884615 0.77142857 0.75700935 0.81944444]
mean value: 0.7722603259177057
key: test_recall
value: [0.9 0.9 1. 0.7 0.8 0.81818182
0.72727273 0.81818182 0.81818182 0.81818182]
mean value: 0.8300000000000001
key: train_recall
value: [0.85263158 0.84210526 0.85263158 0.85263158 0.89473684 1.
0.86170213 0.86170213 0.86170213 0.62765957]
mean value: 0.8507502799552071
key: test_roc_auc
value: [0.66428571 0.73571429 0.85714286 0.56428571 0.61428571 0.49242424
0.61363636 0.49242424 0.57575758 0.57575758]
mean value: 0.6185714285714285
key: train_roc_auc
value: [0.74528131 0.73139746 0.74528131 0.71941924 0.72323049 0.63559322
0.73593581 0.72746123 0.71051208 0.7036603 ]
mean value: 0.7177772440103329
key: test_jcc
value: [0.64285714 0.69230769 0.83333333 0.5 0.57142857 0.5625
0.57142857 0.5625 0.6 0.6 ]
mean value: 0.6136355311355312
key: train_jcc
value: [0.69827586 0.68376068 0.69827586 0.68067227 0.70247934 0.68613139
0.69230769 0.68644068 0.675 0.55140187]
mean value: 0.675474564194314
MCC on Blind test: 0.12
Accuracy on Blind test: 0.6
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00986075 0.01002407 0.00997305 0.01180744 0.01076865 0.01062441
0.01006889 0.00927901 0.01006889 0.00926876]
mean value: 0.010174393653869629
key: score_time
value: [0.00983071 0.00966644 0.00964904 0.01030135 0.01011753 0.00926757
0.00965786 0.00972462 0.00960636 0.00913143]
mean value: 0.009695291519165039
key: test_mcc
value: [ 0.23975611 0.38251843 0.63262663 0.02857143 0.50920105 0.2030906
-0.13241022 -0.28787879 0.13241022 0.17069719]
mean value: 0.18785826455207946
key: train_mcc
value: [0.42092813 0.39333516 0.32656704 0.46856319 0.39056476 0.41094842
0.47583844 0.38542713 0.43627743 0.37735366]
mean value: 0.40858033755597606
key: test_accuracy
value: [0.64705882 0.70588235 0.82352941 0.52941176 0.76470588 0.58823529
0.41176471 0.41176471 0.58823529 0.64705882]
mean value: 0.611764705882353
key: train_accuracy
value: [0.7254902 0.7124183 0.69281046 0.74509804 0.71895425 0.7254902
0.75163399 0.70588235 0.73202614 0.70588235]
mean value: 0.7215686274509804
key: test_fscore
value: [0.75 0.76190476 0.85714286 0.6 0.81818182 0.63157895
0.44444444 0.54545455 0.66666667 0.75 ]
mean value: 0.6825374041163514
key: train_fscore
value: [0.77659574 0.76595745 0.76616915 0.78918919 0.78172589 0.78350515
0.79787234 0.75675676 0.78074866 0.76190476]
mean value: 0.776042510006011
key: test_precision
value: [0.64285714 0.72727273 0.81818182 0.6 0.75 0.75
0.57142857 0.54545455 0.7 0.69230769]
mean value: 0.6797502497502498
key: train_precision
value: [0.78494624 0.77419355 0.72641509 0.81111111 0.75490196 0.76
0.79787234 0.76923077 0.78494624 0.75789474]
mean value: 0.772151203423883
key: test_recall
value: [0.9 0.8 0.9 0.6 0.9 0.54545455
0.36363636 0.54545455 0.63636364 0.81818182]
mean value: 0.7009090909090909
key: train_recall
value: [0.76842105 0.75789474 0.81052632 0.76842105 0.81052632 0.80851064
0.79787234 0.74468085 0.77659574 0.76595745]
mean value: 0.7809406494960806
key: test_roc_auc
value: [0.59285714 0.68571429 0.80714286 0.51428571 0.73571429 0.60606061
0.43181818 0.35606061 0.56818182 0.57575758]
mean value: 0.5873593073593074
key: train_roc_auc
value: [0.71179673 0.69791289 0.65526316 0.7376588 0.68974592 0.70086549
0.73791922 0.69437432 0.71880635 0.68806347]
mean value: 0.7032406345084143
key: test_jcc
value: [0.6 0.61538462 0.75 0.42857143 0.69230769 0.46153846
0.28571429 0.375 0.5 0.6 ]
mean value: 0.5308516483516483
key: train_jcc
value: [0.63478261 0.62068966 0.62096774 0.65178571 0.64166667 0.6440678
0.66371681 0.60869565 0.64035088 0.61538462]
mean value: 0.6342108142276903
MCC on Blind test: 0.29
Accuracy on Blind test: 0.67
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00887656 0.01209188 0.0101974 0.00957799 0.00946403 0.00974488
0.0111022 0.00958228 0.01056123 0.01019073]
mean value: 0.010138916969299316
key: score_time
value: [0.05371165 0.02205682 0.01622033 0.01528668 0.01598525 0.01636815
0.01718068 0.01583552 0.01490521 0.01607275]
mean value: 0.020362305641174316
key: test_mcc
value: [ 0.13241022 -0.11769798 -0.46409548 -0.07377111 -0.38729833 0.13241022
-0.11948803 -0.28787879 0.06356417 -0.11769798]
mean value: -0.12395430776647315
key: train_mcc
value: [0.40852687 0.3435988 0.41056782 0.3789188 0.30157232 0.3679126
0.34836646 0.43470567 0.35262985 0.35371983]
mean value: 0.37005190349511996
key: test_accuracy
value: [0.58823529 0.47058824 0.35294118 0.52941176 0.41176471 0.58823529
0.52941176 0.41176471 0.58823529 0.47058824]
mean value: 0.49411764705882355
key: train_accuracy
value: [0.73202614 0.70588235 0.73202614 0.71895425 0.68627451 0.7124183
0.69934641 0.73856209 0.70588235 0.70588235]
mean value: 0.7137254901960784
key: test_fscore
value: [0.66666667 0.57142857 0.52173913 0.66666667 0.58333333 0.66666667
0.66666667 0.54545455 0.69565217 0.57142857]
mean value: 0.6155702992659514
key: train_fscore
value: [0.80382775 0.79069767 0.8 0.79227053 0.76923077 0.79047619
0.76767677 0.7979798 0.784689 0.7826087 ]
mean value: 0.7879457173246753
key: test_precision
value: [0.63636364 0.54545455 0.46153846 0.57142857 0.5 0.7
0.61538462 0.54545455 0.66666667 0.6 ]
mean value: 0.5842291042291042
key: train_precision
value: [0.73684211 0.70833333 0.74545455 0.73214286 0.7079646 0.71551724
0.73076923 0.75961538 0.71304348 0.71681416]
mean value: 0.7266496937280635
key: test_recall
value: [0.7 0.6 0.6 0.8 0.7 0.63636364
0.72727273 0.54545455 0.72727273 0.54545455]
mean value: 0.6581818181818182
key: train_recall
value: [0.88421053 0.89473684 0.86315789 0.86315789 0.84210526 0.88297872
0.80851064 0.84042553 0.87234043 0.86170213]
mean value: 0.8613325867861142
key: test_roc_auc
value: [0.56428571 0.44285714 0.3 0.47142857 0.35 0.56818182
0.4469697 0.35606061 0.53030303 0.43939394]
mean value: 0.4469480519480519
key: train_roc_auc
value: [0.68348457 0.64564428 0.69019964 0.67295826 0.63656987 0.66182834
0.66696718 0.70834836 0.6565092 0.65966462]
mean value: 0.6682174330774522
key: test_jcc
value: [0.5 0.4 0.35294118 0.5 0.41176471 0.5
0.5 0.375 0.53333333 0.4 ]
mean value: 0.44730392156862747
key: train_jcc
value: [0.672 0.65384615 0.66666667 0.656 0.625 0.65354331
0.62295082 0.66386555 0.64566929 0.64285714]
mean value: 0.6502398927685779
MCC on Blind test: 0.12
Accuracy on Blind test: 0.6
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01092124 0.01062822 0.01066852 0.0106132 0.01072216 0.01072073
0.0104661 0.01049161 0.01138711 0.01187062]
mean value: 0.01084895133972168
key: score_time
value: [0.00920558 0.0091362 0.00914645 0.00922441 0.00928783 0.00917053
0.00909495 0.00927687 0.00960684 0.00933766]
mean value: 0.009248733520507812
key: test_mcc
value: [ 0.29880715 0.43643578 0.29880715 0.09944903 0.06546537 0.11236664
-0.01899343 -0.01899343 0.3385016 0.11236664]
mean value: 0.17242125126907817
key: train_mcc
value: [0.52447344 0.48191696 0.45248357 0.59244006 0.56560446 0.58429818
0.58307945 0.59739548 0.51726562 0.57111391]
mean value: 0.5470071110789382
key: test_accuracy
value: [0.64705882 0.70588235 0.64705882 0.58823529 0.58823529 0.64705882
0.58823529 0.58823529 0.70588235 0.64705882]
mean value: 0.6352941176470589
key: train_accuracy
value: [0.76470588 0.74509804 0.73202614 0.79738562 0.78431373 0.79084967
0.79738562 0.79738562 0.75816993 0.78431373]
mean value: 0.7751633986928105
key: test_fscore
value: [0.76923077 0.8 0.76923077 0.69565217 0.72 0.76923077
0.72 0.72 0.81481481 0.76923077]
mean value: 0.7547390065650935
key: train_fscore
value: [0.84070796 0.82969432 0.82251082 0.85972851 0.85201794 0.85454545
0.85581395 0.85844749 0.83555556 0.85067873]
mean value: 0.8459700739469289
key: test_precision
value: [0.625 0.66666667 0.625 0.61538462 0.6 0.66666667
0.64285714 0.64285714 0.6875 0.66666667]
mean value: 0.6438598901098901
key: train_precision
value: [0.72519084 0.70895522 0.69852941 0.75396825 0.7421875 0.74603175
0.76033058 0.752 0.71755725 0.74015748]
mean value: 0.7344908286075713
key: test_recall
value: [1. 1. 1. 0.8 0.9 0.90909091
0.81818182 0.81818182 1. 0.90909091]
mean value: 0.9154545454545455
key: train_recall
value: [1. 1. 1. 1. 1. 1. 0.9787234
1. 1. 1. ]
mean value: 0.9978723404255319
key: test_roc_auc
value: [0.57142857 0.64285714 0.57142857 0.54285714 0.52142857 0.53787879
0.49242424 0.49242424 0.58333333 0.53787879]
mean value: 0.5493939393939393
key: train_roc_auc
value: [0.68965517 0.6637931 0.64655172 0.73275862 0.71551724 0.72881356
0.74359899 0.73728814 0.68644068 0.72033898]
mean value: 0.7064756208264422
key: test_jcc
value: [0.625 0.66666667 0.625 0.53333333 0.5625 0.625
0.5625 0.5625 0.6875 0.625 ]
mean value: 0.6075
key: train_jcc
value: [0.72519084 0.70895522 0.69852941 0.75396825 0.7421875 0.74603175
0.74796748 0.752 0.71755725 0.74015748]
mean value: 0.7332545187238113
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.8853507 0.85311651 0.85744238 1.03879786 1.08934593 1.1350615
0.69020605 0.69074345 1.01780343 1.07197356]
mean value: 0.9329841375350952
key: score_time
value: [0.01643276 0.01357484 0.01391101 0.01479197 0.01400852 0.01294899
0.01254201 0.01256633 0.02390885 0.01315498]
mean value: 0.014784026145935058
key: test_mcc
value: [ 0.38251843 0.63262663 0.50920105 0.13241022 -0.01543033 0.69631062
0.29012943 0.17069719 0.74242424 -0.01899343]
mean value: 0.3521894047164522
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.70588235 0.82352941 0.76470588 0.58823529 0.52941176 0.82352941
0.64705882 0.64705882 0.88235294 0.58823529]
mean value: 0.7
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.76190476 0.85714286 0.81818182 0.66666667 0.63636364 0.84210526
0.7 0.75 0.90909091 0.72 ]
mean value: 0.7661455912508545
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.72727273 0.81818182 0.75 0.63636364 0.58333333 1.
0.77777778 0.69230769 0.90909091 0.64285714]
mean value: 0.7537185037185037
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.9 0.9 0.7 0.7 0.72727273
0.63636364 0.81818182 0.90909091 0.81818182]
mean value: 0.7909090909090909
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.68571429 0.80714286 0.73571429 0.56428571 0.49285714 0.86363636
0.65151515 0.57575758 0.87121212 0.49242424]
mean value: 0.6740259740259741
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.61538462 0.75 0.69230769 0.5 0.46666667 0.72727273
0.53846154 0.6 0.83333333 0.5625 ]
mean value: 0.6285926573426573
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.6
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.022614 0.01708698 0.01282835 0.01164603 0.01256561 0.01273346
0.01293755 0.0132103 0.01343513 0.01459908]
mean value: 0.014365649223327637
key: score_time
value: [0.02368164 0.00920272 0.0085516 0.0085299 0.00866437 0.00875759
0.00901079 0.0088861 0.00909567 0.00948334]
mean value: 0.010386371612548828
key: test_mcc
value: [0.38251843 0.77151675 0.7 0.51428571 0.88273483 0.88273483
1. 0.69631062 0.60385964 0.88273483]
mean value: 0.731669564240748
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.70588235 0.88235294 0.82352941 0.76470588 0.94117647 0.94117647
1. 0.82352941 0.82352941 0.94117647]
mean value: 0.8647058823529412
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.76190476 0.90909091 0.82352941 0.8 0.95238095 0.95238095
1. 0.84210526 0.86956522 0.95238095]
mean value: 0.8863338420452433
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.72727273 0.83333333 1. 0.8 0.90909091 1.
1. 1. 0.83333333 1. ]
mean value: 0.9103030303030303
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 1. 0.7 0.8 1. 0.90909091
1. 0.72727273 0.90909091 0.90909091]
mean value: 0.8754545454545455
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.68571429 0.85714286 0.85 0.75714286 0.92857143 0.95454545
1. 0.86363636 0.78787879 0.95454545]
mean value: 0.863917748917749
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.61538462 0.83333333 0.7 0.66666667 0.90909091 0.90909091
1. 0.72727273 0.76923077 0.90909091]
mean value: 0.8039160839160839
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.6
Accuracy on Blind test: 0.8
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10089493 0.09773612 0.09653449 0.09245515 0.0937891 0.09260631
0.09319448 0.09338927 0.09292459 0.09129429]
mean value: 0.09448187351226807
key: score_time
value: [0.01952076 0.01778722 0.01751566 0.01846743 0.01758528 0.01765728
0.01768947 0.01766562 0.01741314 0.01764417]
mean value: 0.017894601821899413
key: test_mcc
value: [0.38122129 0.77151675 0.50920105 0.27142857 0.24688536 0.63262663
0.22727273 0.11236664 0.60385964 0.22727273]
mean value: 0.3983651389885671
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.70588235 0.88235294 0.76470588 0.64705882 0.64705882 0.82352941
0.64705882 0.64705882 0.82352941 0.64705882]
mean value: 0.7235294117647059
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7826087 0.90909091 0.81818182 0.7 0.72727273 0.85714286
0.72727273 0.76923077 0.86956522 0.72727273]
mean value: 0.7887638448508013
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.69230769 0.83333333 0.75 0.7 0.66666667 0.9
0.72727273 0.66666667 0.83333333 0.72727273]
mean value: 0.7496853146853146
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9 1. 0.9 0.7 0.8 0.81818182
0.72727273 0.90909091 0.90909091 0.72727273]
mean value: 0.8390909090909091
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.66428571 0.85714286 0.73571429 0.63571429 0.61428571 0.82575758
0.61363636 0.53787879 0.78787879 0.61363636]
mean value: 0.6885930735930736
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.64285714 0.83333333 0.69230769 0.53846154 0.57142857 0.75
0.57142857 0.625 0.76923077 0.57142857]
mean value: 0.656547619047619
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00948 0.01067305 0.00918198 0.00951219 0.0090723 0.00886846
0.00886393 0.00893688 0.00908589 0.01140523]
mean value: 0.009507989883422852
key: score_time
value: [0.00905252 0.00954819 0.00903249 0.00884628 0.00869274 0.00862312
0.00868821 0.00864053 0.00882077 0.00961161]
mean value: 0.008955645561218261
key: test_mcc
value: [ 0.11769798 -0.27774603 0.27142857 0.13241022 0.38122129 0.22727273
0.33371191 0.38251843 0.29012943 0.22727273]
mean value: 0.20859172445585672
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.52941176 0.41176471 0.64705882 0.58823529 0.70588235 0.64705882
0.70588235 0.70588235 0.64705882 0.64705882]
mean value: 0.6235294117647059
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.5 0.54545455 0.7 0.66666667 0.7826087 0.72727273
0.7826087 0.76190476 0.7 0.72727273]
mean value: 0.6893788819875777
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.5 0.7 0.63636364 0.69230769 0.72727273
0.75 0.8 0.77777778 0.72727273]
mean value: 0.6977661227661227
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.4 0.6 0.7 0.7 0.9 0.72727273
0.81818182 0.72727273 0.63636364 0.72727273]
mean value: 0.6936363636363636
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.55714286 0.37142857 0.63571429 0.56428571 0.66428571 0.61363636
0.65909091 0.6969697 0.65151515 0.61363636]
mean value: 0.6027705627705628
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.33333333 0.375 0.53846154 0.5 0.64285714 0.57142857
0.64285714 0.61538462 0.53846154 0.57142857]
mean value: 0.5329212454212454
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.17
Accuracy on Blind test: 0.6
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.27949595 1.2522831 1.23157477 1.28226137 1.31405354 1.33238959
1.32696438 1.29168916 1.21863532 1.26388741]
mean value: 1.2793234586715698
key: score_time
value: [0.10485697 0.09505439 0.0964644 0.17082667 0.10032129 0.09584975
0.09731865 0.12909579 0.12827754 0.13281274]
mean value: 0.11508781909942627
key: test_mcc
value: [0.66299354 0.66299354 0.63262663 0.50920105 0.77151675 0.63262663
0.87400737 0.4608824 0.60385964 0.60385964]
mean value: 0.6414567202607909
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.82352941 0.82352941 0.82352941 0.76470588 0.88235294 0.82352941
0.94117647 0.76470588 0.82352941 0.82352941]
mean value: 0.8294117647058823
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.86956522 0.86956522 0.85714286 0.81818182 0.90909091 0.85714286
0.95652174 0.83333333 0.86956522 0.86956522]
mean value: 0.8709674383587427
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.76923077 0.76923077 0.81818182 0.75 0.83333333 0.9
0.91666667 0.76923077 0.83333333 0.83333333]
mean value: 0.8192540792540792
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.9 0.9 1. 0.81818182
1. 0.90909091 0.90909091 0.90909091]
mean value: 0.9345454545454546
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.78571429 0.78571429 0.80714286 0.73571429 0.85714286 0.82575758
0.91666667 0.70454545 0.78787879 0.78787879]
mean value: 0.7994155844155845
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.76923077 0.76923077 0.75 0.69230769 0.83333333 0.75
0.91666667 0.71428571 0.76923077 0.76923077]
mean value: 0.7733516483516484
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.58
Accuracy on Blind test: 0.8
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [1.94012594 0.93395567 0.88578105 0.94665599 1.04666948 1.01622057
1.05611706 1.00596952 1.00941062 1.69974065]
mean value: 1.154064655303955
key: score_time
value: [0.16018629 0.12835646 0.13689685 0.14965868 0.14444923 0.15999627
0.15356684 0.15862679 0.17733407 0.12240601]
mean value: 0.14914774894714355
key: test_mcc
value: [0.55328334 0.66299354 0.77151675 0.50920105 0.66299354 0.63262663
0.4608824 0.4608824 0.87400737 0.62678317]
mean value: 0.6215170201634855
key: train_mcc
value: [0.87638923 0.88986734 0.88986734 0.93172069 0.88986734 0.8640452
0.87733952 0.89069566 0.89069566 0.87733952]
mean value: 0.8877827514166594
key: test_accuracy
value: [0.76470588 0.82352941 0.88235294 0.76470588 0.82352941 0.82352941
0.76470588 0.76470588 0.94117647 0.82352941]
mean value: 0.8176470588235294
key: train_accuracy
value: [0.94117647 0.94771242 0.94771242 0.96732026 0.94771242 0.93464052
0.94117647 0.94771242 0.94771242 0.94117647]
mean value: 0.9464052287581699
key: test_fscore
value: [0.83333333 0.86956522 0.90909091 0.81818182 0.86956522 0.85714286
0.83333333 0.83333333 0.95652174 0.88 ]
mean value: 0.8660067758328628
key: train_fscore
value: [0.95431472 0.95918367 0.95918367 0.97435897 0.95918367 0.94897959
0.95384615 0.95876289 0.95876289 0.95384615]
mean value: 0.9580422388304239
key: test_precision
value: [0.71428571 0.76923077 0.83333333 0.75 0.76923077 0.9
0.76923077 0.76923077 0.91666667 0.78571429]
mean value: 0.7976923076923077
key: train_precision
value: [0.92156863 0.93069307 0.93069307 0.95 0.93069307 0.91176471
0.92079208 0.93 0.93 0.92079208]
mean value: 0.9276996699669967
key: test_recall
value: [1. 1. 1. 0.9 1. 0.81818182
0.90909091 0.90909091 1. 1. ]
mean value: 0.9536363636363636
key: train_recall
value: [0.98947368 0.98947368 0.98947368 1. 0.98947368 0.9893617
0.9893617 0.9893617 0.9893617 0.9893617 ]
mean value: 0.9904703247480403
key: test_roc_auc
value: [0.71428571 0.78571429 0.85714286 0.73571429 0.78571429 0.82575758
0.70454545 0.70454545 0.91666667 0.75 ]
mean value: 0.778008658008658
key: train_roc_auc
value: [0.92577132 0.93439201 0.93439201 0.95689655 0.93439201 0.91840966
0.92688424 0.93535882 0.93535882 0.92688424]
mean value: 0.9328739700888068
key: test_jcc
value: [0.71428571 0.76923077 0.83333333 0.69230769 0.76923077 0.75
0.71428571 0.71428571 0.91666667 0.78571429]
mean value: 0.765934065934066
key: train_jcc
value: [0.91262136 0.92156863 0.92156863 0.95 0.92156863 0.90291262
0.91176471 0.92079208 0.92079208 0.91176471]
mean value: 0.9195353433116012
MCC on Blind test: 0.74
Accuracy on Blind test: 0.87
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01290655 0.01201439 0.0091145 0.00908995 0.00977802 0.00959659
0.00912237 0.0090313 0.00955582 0.00962853]
mean value: 0.00998380184173584
key: score_time
value: [0.01208782 0.00901628 0.00966477 0.0090487 0.00951076 0.00898027
0.00883889 0.00868464 0.00920987 0.00892973]
mean value: 0.009397172927856445
key: test_mcc
value: [ 0.23975611 0.38251843 0.63262663 0.02857143 0.50920105 0.2030906
-0.13241022 -0.28787879 0.13241022 0.17069719]
mean value: 0.18785826455207946
key: train_mcc
value: [0.42092813 0.39333516 0.32656704 0.46856319 0.39056476 0.41094842
0.47583844 0.38542713 0.43627743 0.37735366]
mean value: 0.40858033755597606
key: test_accuracy
value: [0.64705882 0.70588235 0.82352941 0.52941176 0.76470588 0.58823529
0.41176471 0.41176471 0.58823529 0.64705882]
mean value: 0.611764705882353
key: train_accuracy
value: [0.7254902 0.7124183 0.69281046 0.74509804 0.71895425 0.7254902
0.75163399 0.70588235 0.73202614 0.70588235]
mean value: 0.7215686274509804
key: test_fscore
value: [0.75 0.76190476 0.85714286 0.6 0.81818182 0.63157895
0.44444444 0.54545455 0.66666667 0.75 ]
mean value: 0.6825374041163514
key: train_fscore
value: [0.77659574 0.76595745 0.76616915 0.78918919 0.78172589 0.78350515
0.79787234 0.75675676 0.78074866 0.76190476]
mean value: 0.776042510006011
key: test_precision
value: [0.64285714 0.72727273 0.81818182 0.6 0.75 0.75
0.57142857 0.54545455 0.7 0.69230769]
mean value: 0.6797502497502498
key: train_precision
value: [0.78494624 0.77419355 0.72641509 0.81111111 0.75490196 0.76
0.79787234 0.76923077 0.78494624 0.75789474]
mean value: 0.772151203423883
key: test_recall
value: [0.9 0.8 0.9 0.6 0.9 0.54545455
0.36363636 0.54545455 0.63636364 0.81818182]
mean value: 0.7009090909090909
key: train_recall
value: [0.76842105 0.75789474 0.81052632 0.76842105 0.81052632 0.80851064
0.79787234 0.74468085 0.77659574 0.76595745]
mean value: 0.7809406494960806
key: test_roc_auc
value: [0.59285714 0.68571429 0.80714286 0.51428571 0.73571429 0.60606061
0.43181818 0.35606061 0.56818182 0.57575758]
mean value: 0.5873593073593074
key: train_roc_auc
value: [0.71179673 0.69791289 0.65526316 0.7376588 0.68974592 0.70086549
0.73791922 0.69437432 0.71880635 0.68806347]
mean value: 0.7032406345084143
key: test_jcc
value: [0.6 0.61538462 0.75 0.42857143 0.69230769 0.46153846
0.28571429 0.375 0.5 0.6 ]
mean value: 0.5308516483516483
key: train_jcc
value: [0.63478261 0.62068966 0.62096774 0.65178571 0.64166667 0.6440678
0.66371681 0.60869565 0.64035088 0.61538462]
mean value: 0.6342108142276903
MCC on Blind test: 0.29
Accuracy on Blind test: 0.67
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.95170927 1.08157611 4.2941637 1.35887098 1.44883847 1.49672079
1.49727535 1.45497322 4.00354838 5.43425512]
mean value: 2.3021931409835816
key: score_time
value: [0.01140547 0.05233741 0.01315928 0.01244068 0.0132606 0.01195669
0.01259422 0.01315713 0.02558994 0.01518345]
mean value: 0.018108487129211426
key: test_mcc
value: [0.38122129 0.66299354 0.88741197 0.63262663 0.75714286 0.87400737
1. 0.78334945 0.87400737 0.88273483]
mean value: 0.7735495312682757
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.70588235 0.82352941 0.94117647 0.82352941 0.88235294 0.94117647
1. 0.88235294 0.94117647 0.94117647]
mean value: 0.888235294117647
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7826087 0.86956522 0.94736842 0.85714286 0.9 0.95652174
1. 0.9 0.95652174 0.95238095]
mean value: 0.912210962188079
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.69230769 0.76923077 1. 0.81818182 0.9 0.91666667
1. 1. 0.91666667 1. ]
mean value: 0.9013053613053613
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9 1. 0.9 0.9 0.9 1.
1. 0.81818182 1. 0.90909091]
mean value: 0.9327272727272727
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.66428571 0.78571429 0.95 0.80714286 0.87857143 0.91666667
1. 0.90909091 0.91666667 0.95454545]
mean value: 0.8782683982683983
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.64285714 0.76923077 0.9 0.75 0.81818182 0.91666667
1. 0.81818182 0.91666667 0.90909091]
mean value: 0.8440875790875791
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04736233 0.06068015 0.0650456 0.06523633 0.05861163 0.05586934
0.0611794 0.06430578 0.06066442 0.05646658]
mean value: 0.059542155265808104
key: score_time
value: [0.0297966 0.0236876 0.01204276 0.02082038 0.02062225 0.01970601
0.02349544 0.0230937 0.02011728 0.02406526]
mean value: 0.021744728088378906
key: test_mcc
value: [0.51428571 0.63262663 0.30988989 0.50920105 0.07042952 0.74242424
0.53673944 0.4608824 0.2030906 0.78334945]
mean value: 0.4762918944486056
key: train_mcc
value: [0.95830113 0.94445829 0.98616507 0.95830113 1. 0.94483888
0.98625704 0.98625704 1. 0.97261224]
mean value: 0.9737190818635186
key: test_accuracy
value: [0.76470588 0.82352941 0.64705882 0.76470588 0.52941176 0.88235294
0.76470588 0.76470588 0.58823529 0.88235294]
mean value: 0.7411764705882353
key: train_accuracy
value: [0.98039216 0.97385621 0.99346405 0.98039216 1. 0.97385621
0.99346405 0.99346405 1. 0.9869281 ]
mean value: 0.9875816993464053
key: test_fscore
value: [0.8 0.85714286 0.66666667 0.81818182 0.55555556 0.90909091
0.8 0.83333333 0.63157895 0.9 ]
mean value: 0.777155008733956
key: train_fscore
value: [0.98429319 0.97916667 0.9947644 0.98429319 1. 0.97894737
0.99470899 0.99470899 1. 0.98947368]
mean value: 0.9900356494056549
key: test_precision
value: [0.8 0.81818182 0.75 0.75 0.625 0.90909091
0.88888889 0.76923077 0.75 1. ]
mean value: 0.8060392385392385
key: train_precision
value: [0.97916667 0.96907216 0.98958333 0.97916667 1. 0.96875
0.98947368 0.98947368 1. 0.97916667]
mean value: 0.9843852866702839
key: test_recall
value: [0.8 0.9 0.6 0.9 0.5 0.90909091
0.72727273 0.90909091 0.54545455 0.81818182]
mean value: 0.7609090909090909
key: train_recall
value: [0.98947368 0.98947368 1. 0.98947368 1. 0.9893617
1. 1. 1. 1. ]
mean value: 0.9957782754759239
key: test_roc_auc
value: [0.75714286 0.80714286 0.65714286 0.73571429 0.53571429 0.87121212
0.78030303 0.70454545 0.60606061 0.90909091]
mean value: 0.7364069264069264
key: train_roc_auc
value: [0.97749546 0.96887477 0.99137931 0.97749546 1. 0.96925712
0.99152542 0.99152542 1. 0.98305085]
mean value: 0.9850603826239935
key: test_jcc
value: [0.66666667 0.75 0.5 0.69230769 0.38461538 0.83333333
0.66666667 0.71428571 0.46153846 0.81818182]
mean value: 0.6487595737595737
key: train_jcc
value: [0.96907216 0.95918367 0.98958333 0.96907216 1. 0.95876289
0.98947368 0.98947368 1. 0.97916667]
mean value: 0.9803788258385285
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02010775 0.00967646 0.00920439 0.00903702 0.00907183 0.00884581
0.00869846 0.0089879 0.0096581 0.00971985]
mean value: 0.010300755500793457
key: score_time
value: [0.01351142 0.0090692 0.00885296 0.00862312 0.00895929 0.00884748
0.00848365 0.00872898 0.00927663 0.00903773]
mean value: 0.009339046478271485
key: test_mcc
value: [ 0.55328334 0.50920105 0.66299354 0.02857143 0.24688536 0.22727273
0.04351941 -0.11948803 0.33371191 0.49441323]
mean value: 0.29803639728108056
key: train_mcc
value: [0.42621329 0.33692443 0.38059794 0.36668738 0.39480728 0.43071005
0.43299259 0.3747783 0.37096514 0.38834821]
mean value: 0.3903024610468177
key: test_accuracy
value: [0.76470588 0.76470588 0.82352941 0.52941176 0.64705882 0.64705882
0.52941176 0.52941176 0.70588235 0.76470588]
mean value: 0.6705882352941176
key: train_accuracy
value: [0.73856209 0.69934641 0.71895425 0.7124183 0.7254902 0.73856209
0.73856209 0.7124183 0.7124183 0.71895425]
mean value: 0.7215686274509804
key: test_fscore
value: [0.83333333 0.81818182 0.86956522 0.6 0.72727273 0.72727273
0.6 0.66666667 0.7826087 0.84615385]
mean value: 0.7471055031924597
key: train_fscore
value: [0.80392157 0.7745098 0.7902439 0.78431373 0.7961165 0.80392157
0.8 0.78 0.78431373 0.78606965]
mean value: 0.790341045119155
key: test_precision
value: [0.71428571 0.75 0.76923077 0.6 0.66666667 0.72727273
0.66666667 0.61538462 0.75 0.73333333]
mean value: 0.6992840492840493
key: train_precision
value: [0.75229358 0.72477064 0.73636364 0.73394495 0.73873874 0.74545455
0.75471698 0.73584906 0.72727273 0.73831776]
mean value: 0.7387722616886769
key: test_recall
value: [1. 0.9 1. 0.6 0.8 0.72727273
0.54545455 0.72727273 0.81818182 1. ]
mean value: 0.8118181818181818
key: train_recall
value: [0.86315789 0.83157895 0.85263158 0.84210526 0.86315789 0.87234043
0.85106383 0.82978723 0.85106383 0.84042553]
mean value: 0.8497312430011198
key: test_roc_auc
value: [0.71428571 0.73571429 0.78571429 0.51428571 0.61428571 0.61363636
0.52272727 0.4469697 0.65909091 0.66666667]
mean value: 0.6273376623376623
key: train_roc_auc
value: [0.69882033 0.65716878 0.67631579 0.67105263 0.68157895 0.69888208
0.70519293 0.67760548 0.67129463 0.68292463]
mean value: 0.682083622669467
key: test_jcc
value: [0.71428571 0.69230769 0.76923077 0.42857143 0.57142857 0.57142857
0.42857143 0.5 0.64285714 0.73333333]
mean value: 0.6052014652014652
key: train_jcc
value: [0.67213115 0.632 0.65322581 0.64516129 0.66129032 0.67213115
0.66666667 0.63934426 0.64516129 0.64754098]
mean value: 0.6534652917327692
MCC on Blind test: -0.07
Accuracy on Blind test: 0.53
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01129508 0.01568604 0.01332641 0.03448272 0.01468325 0.01636481
0.03756166 0.05245757 0.02793503 0.01410508]
mean value: 0.02378976345062256
key: score_time
value: [0.00920486 0.01116633 0.01118207 0.02200365 0.01170421 0.01170301
0.0180769 0.02105355 0.01168871 0.01157594]
mean value: 0.01393592357635498
key: test_mcc
value: [0.51428571 0.55328334 0.63262663 0.38122129 0.29880715 0.88273483
0.49441323 0.33371191 0.47673129 0.30389487]
mean value: 0.4871710250874902
key: train_mcc
value: [0.90340823 0.86531409 0.86241574 0.83628052 0.65815792 0.95883964
0.51726562 0.87413232 0.61903367 0.73827438]
mean value: 0.783312212337401
key: test_accuracy
value: [0.76470588 0.76470588 0.82352941 0.70588235 0.64705882 0.94117647
0.76470588 0.70588235 0.64705882 0.70588235]
mean value: 0.7470588235294118
key: train_accuracy
value: [0.95424837 0.93464052 0.93464052 0.92156863 0.83006536 0.98039216
0.75816993 0.93464052 0.76470588 0.86928105]
mean value: 0.888235294117647
key: test_fscore
value: [0.8 0.83333333 0.85714286 0.7826087 0.76923077 0.95238095
0.84615385 0.7826087 0.625 0.8 ]
mean value: 0.8048459149546106
key: train_fscore
value: [0.96410256 0.95 0.94680851 0.94 0.87962963 0.98395722
0.83555556 0.94382022 0.76315789 0.90384615]
mean value: 0.9110877752479482
key: test_precision
value: [0.8 0.71428571 0.81818182 0.69230769 0.625 1.
0.73333333 0.75 1. 0.71428571]
mean value: 0.7847394272394272
key: train_precision
value: [0.94 0.9047619 0.95698925 0.8952381 0.78512397 0.98924731
0.71755725 1. 1. 0.8245614 ]
mean value: 0.9013479181499102
key: test_recall
value: [0.8 1. 0.9 0.9 1. 0.90909091
1. 0.81818182 0.45454545 0.90909091]
mean value: 0.8690909090909091
key: train_recall
value: [0.98947368 1. 0.93684211 0.98947368 1. 0.9787234
1. 0.89361702 0.61702128 1. ]
mean value: 0.940515117581187
key: test_roc_auc
value: [0.75714286 0.71428571 0.80714286 0.66428571 0.57142857 0.95454545
0.66666667 0.65909091 0.72727273 0.62121212]
mean value: 0.7143073593073593
key: train_roc_auc
value: [0.9430127 0.9137931 0.93393829 0.89990926 0.77586207 0.98088713
0.68644068 0.94680851 0.80851064 0.83050847]
mean value: 0.8719670853832294
key: test_jcc
value: [0.66666667 0.71428571 0.75 0.64285714 0.625 0.90909091
0.73333333 0.64285714 0.45454545 0.66666667]
mean value: 0.680530303030303
key: train_jcc
value: [0.93069307 0.9047619 0.8989899 0.88679245 0.78512397 0.96842105
0.71755725 0.89361702 0.61702128 0.8245614 ]
mean value: 0.842753929875216
MCC on Blind test: 0.6
Accuracy on Blind test: 0.8
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01497602 0.02522302 0.03343582 0.03448534 0.02283978 0.03451777
0.03477836 0.03254557 0.01978588 0.01348734]
mean value: 0.026607489585876463
key: score_time
value: [0.01177239 0.01971555 0.02048707 0.01987171 0.02130342 0.01973534
0.03330612 0.02216315 0.02278042 0.01166964]
mean value: 0.02028048038482666
key: test_mcc
value: [0.36780618 0.66299354 0.36780618 0.38122129 0.06546537 0.88273483
0.26967994 0.2030906 0.4608824 0.3385016 ]
mean value: 0.4000181931146463
key: train_mcc
value: [0.68055705 0.55202478 0.80732775 0.84960093 0.48191696 0.83778301
0.74694017 0.71803726 0.78917952 0.36822985]
mean value: 0.6831597278235293
key: test_accuracy
value: [0.64705882 0.82352941 0.64705882 0.70588235 0.58823529 0.94117647
0.47058824 0.58823529 0.76470588 0.70588235]
mean value: 0.6882352941176471
key: train_accuracy
value: [0.81045752 0.77777778 0.89542484 0.92810458 0.74509804 0.92156863
0.85620915 0.83660131 0.89542484 0.69281046]
mean value: 0.8359477124183007
key: test_fscore
value: [0.625 0.86956522 0.625 0.7826087 0.72 0.95238095
0.30769231 0.63157895 0.83333333 0.81481481]
mean value: 0.7161974268633308
key: train_fscore
value: [0.81987578 0.84821429 0.90804598 0.94472362 0.82969432 0.93478261
0.86746988 0.84662577 0.92156863 0.8 ]
mean value: 0.8721000862893723
key: test_precision
value: [0.83333333 0.76923077 0.83333333 0.69230769 0.6 1.
1. 0.75 0.76923077 0.6875 ]
mean value: 0.7934935897435897
key: train_precision
value: [1. 0.73643411 1. 0.90384615 0.70895522 0.95555556
1. 1. 0.85454545 0.66666667]
mean value: 0.882600316302156
key: test_recall
value: [0.5 1. 0.5 0.9 0.9 0.90909091
0.18181818 0.54545455 0.90909091 1. ]
mean value: 0.7345454545454545
key: train_recall
value: [0.69473684 1. 0.83157895 0.98947368 1. 0.91489362
0.76595745 0.73404255 1. 1. ]
mean value: 0.8930683090705487
key: test_roc_auc
value: [0.67857143 0.78571429 0.67857143 0.66428571 0.52142857 0.95454545
0.59090909 0.60606061 0.70454545 0.58333333]
mean value: 0.6767965367965368
key: train_roc_auc
value: [0.84736842 0.70689655 0.91578947 0.90852995 0.6637931 0.9235485
0.88297872 0.86702128 0.86440678 0.60169492]
mean value: 0.8182027693803942
key: test_jcc
value: [0.45454545 0.76923077 0.45454545 0.64285714 0.5625 0.90909091
0.18181818 0.46153846 0.71428571 0.6875 ]
mean value: 0.5837912087912088
key: train_jcc
value: [0.69473684 0.73643411 0.83157895 0.8952381 0.70895522 0.87755102
0.76595745 0.73404255 0.85454545 0.66666667]
mean value: 0.7765706358739792
MCC on Blind test: 0.22
Accuracy on Blind test: 0.47
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.13799787 0.19559717 0.13715434 0.13844132 0.13842058 0.20680475
0.13890123 0.13737702 0.13736558 0.13849258]
mean value: 0.15065524578094483
key: score_time
value: [0.02109694 0.02060199 0.02058935 0.02053523 0.02055168 0.02046013
0.02037907 0.02750397 0.02053142 0.02038026]
mean value: 0.0212630033493042
key: test_mcc
value: [0.50920105 0.66299354 0.7 0.63262663 0.54935027 0.53673944
1. 0.63262663 0.60385964 1. ]
mean value: 0.6827397199255986
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.76470588 0.82352941 0.82352941 0.82352941 0.76470588 0.76470588
1. 0.82352941 0.82352941 1. ]
mean value: 0.8411764705882353
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.81818182 0.86956522 0.82352941 0.85714286 0.77777778 0.8
1. 0.85714286 0.86956522 1. ]
mean value: 0.8672905156792625
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.76923077 1. 0.81818182 0.875 0.88888889
1. 0.9 0.83333333 1. ]
mean value: 0.8834634809634809
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9 1. 0.7 0.9 0.7 0.72727273
1. 0.81818182 0.90909091 1. ]
mean value: 0.8654545454545455
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.73571429 0.78571429 0.85 0.80714286 0.77857143 0.78030303
1. 0.82575758 0.78787879 1. ]
mean value: 0.8351082251082251
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.69230769 0.76923077 0.7 0.75 0.63636364 0.66666667
1. 0.75 0.76923077 1. ]
mean value: 0.7733799533799534
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.08164883 0.06786728 0.0824008 0.08094072 0.07851148 0.07243514
0.07851815 0.07457829 0.06393242 0.07801318]
mean value: 0.07588462829589844
key: score_time
value: [0.02787757 0.02429485 0.02658248 0.0281651 0.02558804 0.02345562
0.02348995 0.02183986 0.02336168 0.02322936]
mean value: 0.024788451194763184
key: test_mcc
value: [0.66299354 0.77151675 0.78881064 0.63262663 0.75714286 0.87400737
1. 0.78334945 0.74242424 0.88273483]
mean value: 0.7895606313848224
key: train_mcc
value: [0.98625704 1. 0.95857961 1. 0.9722323 1.
0.97241255 0.98625704 0.95857961 0.98625704]
mean value: 0.9820575189370783
key: test_accuracy
value: [0.82352941 0.88235294 0.88235294 0.82352941 0.88235294 0.94117647
1. 0.88235294 0.88235294 0.94117647]
mean value: 0.8941176470588235
key: train_accuracy
value: [0.99346405 1. 0.98039216 1. 0.9869281 1.
0.9869281 0.99346405 0.98039216 0.99346405]
mean value: 0.9915032679738562
key: test_fscore
value: [0.86956522 0.90909091 0.88888889 0.85714286 0.9 0.95652174
1. 0.9 0.90909091 0.95238095]
mean value: 0.9142681473116256
key: train_fscore
value: [0.99470899 1. 0.98412698 1. 0.98947368 1.
0.9893617 0.99470899 0.98412698 0.99470899]
mean value: 0.9931216338719138
key: test_precision
value: [0.76923077 0.83333333 1. 0.81818182 0.9 0.91666667
1. 1. 0.90909091 1. ]
mean value: 0.9146503496503496
key: train_precision
value: [1. 1. 0.9893617 1. 0.98947368 1.
0.9893617 0.98947368 0.97894737 0.98947368]
mean value: 0.9926091825307951
key: test_recall
value: [1. 1. 0.8 0.9 0.9 1.
1. 0.81818182 0.90909091 0.90909091]
mean value: 0.9236363636363636
key: train_recall
value: [0.98947368 1. 0.97894737 1. 0.98947368 1.
0.9893617 1. 0.9893617 1. ]
mean value: 0.9936618141097424
key: test_roc_auc
value: [0.78571429 0.85714286 0.9 0.80714286 0.87857143 0.91666667
1. 0.90909091 0.87121212 0.95454545]
mean value: 0.888008658008658
key: train_roc_auc
value: [0.99473684 1. 0.98085299 1. 0.98611615 1.
0.98620627 0.99152542 0.9777317 0.99152542]
mean value: 0.9908694809882435
key: test_jcc
value: [0.76923077 0.83333333 0.8 0.75 0.81818182 0.91666667
1. 0.81818182 0.83333333 0.90909091]
mean value: 0.8448018648018648
key: train_jcc
value: [0.98947368 1. 0.96875 1. 0.97916667 1.
0.97894737 0.98947368 0.96875 0.98947368]
mean value: 0.9864035087719298
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.08835268 0.09343934 0.08267331 0.09665108 0.07296014 0.09550357
0.10921788 0.09168172 0.07537723 0.08310604]
mean value: 0.0888962984085083
key: score_time
value: [0.03113747 0.02139211 0.03570366 0.02917075 0.02335262 0.03179145
0.03920627 0.03474498 0.03395534 0.02556109]
mean value: 0.03060157299041748
key: test_mcc
value: [ 0.13241022 0.38251843 0.23975611 -0.27774603 -0.18232322 0.33371191
0.04351941 0.11236664 0.4608824 0.33371191]
mean value: 0.1578807778938489
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.58823529 0.70588235 0.64705882 0.41176471 0.47058824 0.70588235
0.52941176 0.64705882 0.76470588 0.70588235]
mean value: 0.6176470588235294
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.76190476 0.75 0.54545455 0.60869565 0.7826087
0.6 0.76923077 0.83333333 0.7826087 ]
mean value: 0.7100503120068337
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.63636364 0.72727273 0.64285714 0.5 0.53846154 0.75
0.66666667 0.66666667 0.76923077 0.75 ]
mean value: 0.6647519147519148
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.7 0.8 0.9 0.6 0.7 0.81818182
0.54545455 0.90909091 0.90909091 0.81818182]
mean value: 0.77
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.56428571 0.68571429 0.59285714 0.37142857 0.42142857 0.65909091
0.52272727 0.53787879 0.70454545 0.65909091]
mean value: 0.5719047619047619
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.61538462 0.6 0.375 0.4375 0.64285714
0.42857143 0.625 0.71428571 0.64285714]
mean value: 0.5581456043956045
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.53
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.53434014 0.47350907 0.3815167 0.38136268 0.38167262 0.37864709
0.38625383 0.3857789 0.41710877 0.39069295]
mean value: 0.4110882759094238
key: score_time
value: [0.0129323 0.01272464 0.01275206 0.01263762 0.01286578 0.01278806
0.0126791 0.01264119 0.01395893 0.01256704]
mean value: 0.012854671478271485
key: test_mcc
value: [0.77151675 0.77151675 0.88741197 0.63262663 0.75714286 0.87400737
1. 0.78334945 0.87400737 1. ]
mean value: 0.8351579150791348
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88235294 0.88235294 0.94117647 0.82352941 0.88235294 0.94117647
1. 0.88235294 0.94117647 1. ]
mean value: 0.9176470588235294
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.90909091 0.94736842 0.85714286 0.9 0.95652174
1. 0.9 0.95652174 1. ]
mean value: 0.9335736574638177
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.83333333 0.83333333 1. 0.81818182 0.9 0.91666667
1. 1. 0.91666667 1. ]
mean value: 0.9218181818181819
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.9 0.9 0.9 1.
1. 0.81818182 1. 1. ]
mean value: 0.9518181818181818
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.85714286 0.85714286 0.95 0.80714286 0.87857143 0.91666667
1. 0.90909091 0.91666667 1. ]
mean value: 0.9092424242424243
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.83333333 0.9 0.75 0.81818182 0.91666667
1. 0.81818182 0.91666667 1. ]
mean value: 0.8786363636363637
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02650714 0.05811 0.05595326 0.04492354 0.07738733 0.04849219
0.03797841 0.04312611 0.06586456 0.07221007]
mean value: 0.05305526256561279
key: score_time
value: [0.01903915 0.03109527 0.02601171 0.02103806 0.02057695 0.01910973
0.02088594 0.0189662 0.02482319 0.02623796]
mean value: 0.022778415679931642
key: test_mcc
value: [ 0.13241022 0.38122129 -0.30550505 0.23975611 0.38122129 0.30389487
0.3385016 -0.01899343 0.06356417 0.30389487]
mean value: 0.18199659502726337
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.58823529 0.70588235 0.47058824 0.64705882 0.70588235 0.70588235
0.70588235 0.58823529 0.58823529 0.70588235]
mean value: 0.6411764705882353
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.7826087 0.64 0.75 0.7826087 0.8
0.81481481 0.72 0.69565217 0.8 ]
mean value: 0.7452351046698873
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.63636364 0.69230769 0.53333333 0.64285714 0.69230769 0.71428571
0.6875 0.64285714 0.66666667 0.71428571]
mean value: 0.6622764735264736
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.7 0.9 0.8 0.9 0.9 0.90909091
1. 0.81818182 0.72727273 0.90909091]
mean value: 0.8563636363636363
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.56428571 0.66428571 0.4 0.59285714 0.66428571 0.62121212
0.58333333 0.49242424 0.53030303 0.62121212]
mean value: 0.5734199134199134
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.64285714 0.47058824 0.6 0.64285714 0.66666667
0.6875 0.5625 0.53333333 0.66666667]
mean value: 0.597296918767507
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.27
Accuracy on Blind test: 0.67
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.08099842 0.08981967 0.04990792 0.05527115 0.05714655 0.05031085
0.05187988 0.05764365 0.05021954 0.05180001]
mean value: 0.05949976444244385
key: score_time
value: [0.0333674 0.02936316 0.02458978 0.03190279 0.02809477 0.03452396
0.02786756 0.02454042 0.02761149 0.02961063]
mean value: 0.02914719581604004
key: test_mcc
value: [0.51428571 0.66299354 0.27142857 0.63262663 0.63262663 0.88273483
0.38251843 0.33371191 0.74242424 0.33371191]
mean value: 0.5389062395989187
key: train_mcc
value: [0.9306986 0.9587737 0.91649194 0.90340823 0.93172069 0.90411865
0.94559731 0.95906064 0.91761348 0.90330977]
mean value: 0.9270793013098675
key: test_accuracy
value: [0.76470588 0.82352941 0.64705882 0.82352941 0.82352941 0.94117647
0.70588235 0.70588235 0.88235294 0.70588235]
mean value: 0.7823529411764706
key: train_accuracy
value: [0.96732026 0.98039216 0.96078431 0.95424837 0.96732026 0.95424837
0.97385621 0.98039216 0.96078431 0.95424837]
mean value: 0.965359477124183
key: test_fscore
value: [0.8 0.86956522 0.7 0.85714286 0.85714286 0.95238095
0.76190476 0.7826087 0.90909091 0.7826087 ]
mean value: 0.827244494635799
key: train_fscore
value: [0.97409326 0.98445596 0.96875 0.96410256 0.97435897 0.96373057
0.97916667 0.98429319 0.96875 0.96335079]
mean value: 0.9725051976931911
key: test_precision
value: [0.8 0.76923077 0.7 0.81818182 0.81818182 1.
0.8 0.75 0.90909091 0.75 ]
mean value: 0.8114685314685315
key: train_precision
value: [0.95918367 0.96938776 0.95876289 0.94 0.95 0.93939394
0.95918367 0.96907216 0.94897959 0.94845361]
mean value: 0.9542417293065305
key: test_recall
value: [0.8 1. 0.7 0.9 0.9 0.90909091
0.72727273 0.81818182 0.90909091 0.81818182]
mean value: 0.8481818181818181
key: train_recall
value: [0.98947368 1. 0.97894737 0.98947368 1. 0.9893617
1. 1. 0.9893617 0.9787234 ]
mean value: 0.9915341545352744
key: test_roc_auc
value: [0.75714286 0.78571429 0.63571429 0.80714286 0.80714286 0.95454545
0.6969697 0.65909091 0.87121212 0.65909091]
mean value: 0.7633766233766234
key: train_roc_auc
value: [0.96025408 0.97413793 0.95499093 0.9430127 0.95689655 0.94383339
0.96610169 0.97457627 0.95230797 0.94698882]
mean value: 0.9573100346025291
key: test_jcc
value: [0.66666667 0.76923077 0.53846154 0.75 0.75 0.90909091
0.61538462 0.64285714 0.83333333 0.64285714]
mean value: 0.7117882117882118
key: train_jcc
value: [0.94949495 0.96938776 0.93939394 0.93069307 0.95 0.93
0.95918367 0.96907216 0.93939394 0.92929293]
mean value: 0.946591242040257
MCC on Blind test: 0.61
Accuracy on Blind test: 0.8
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.411973 0.34421086 0.39803886 0.35081291 0.3494761 0.33145618
0.3288908 0.35393381 0.34595108 0.33966637]
mean value: 0.3554409980773926
key: score_time
value: [0.03101277 0.03190875 0.02901888 0.02524614 0.03131557 0.02620721
0.0313201 0.02892303 0.03440547 0.02825022]
mean value: 0.02976081371307373
key: test_mcc
value: [0.51428571 0.66299354 0.27142857 0.63262663 0.63262663 0.88273483
0.38251843 0.33371191 0.63262663 0.33371191]
mean value: 0.5279264781376682
key: train_mcc
value: [0.9306986 0.9587737 0.91649194 0.90340823 0.94445829 0.90411865
0.94559731 0.95906064 0.93118521 0.90330977]
mean value: 0.929710233179641
key: test_accuracy
value: [0.76470588 0.82352941 0.64705882 0.82352941 0.82352941 0.94117647
0.70588235 0.70588235 0.82352941 0.70588235]
mean value: 0.7764705882352941
key: train_accuracy
value: [0.96732026 0.98039216 0.96078431 0.95424837 0.97385621 0.95424837
0.97385621 0.98039216 0.96732026 0.95424837]
mean value: 0.9666666666666667
key: test_fscore
value: [0.8 0.86956522 0.7 0.85714286 0.85714286 0.95238095
0.76190476 0.7826087 0.85714286 0.7826087 ]
mean value: 0.8220496894409938
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_sl.py:107: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_sl.py:110: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.97409326 0.98445596 0.96875 0.96410256 0.97916667 0.96373057
0.97916667 0.98429319 0.97382199 0.96335079]
mean value: 0.9734931658768399
key: test_precision
value: [0.8 0.76923077 0.7 0.81818182 0.81818182 1.
0.8 0.75 0.9 0.75 ]
mean value: 0.8105594405594406
key: train_precision
value: [0.95918367 0.96938776 0.95876289 0.94 0.96907216 0.93939394
0.95918367 0.96907216 0.95876289 0.94845361]
mean value: 0.9571272752774962
key: test_recall
value: [0.8 1. 0.7 0.9 0.9 0.90909091
0.72727273 0.81818182 0.81818182 0.81818182]
mean value: 0.8390909090909091
key: train_recall
value: [0.98947368 1. 0.97894737 0.98947368 0.98947368 0.9893617
1. 1. 0.9893617 0.9787234 ]
mean value: 0.990481522956327
key: test_roc_auc
value: [0.75714286 0.78571429 0.63571429 0.80714286 0.80714286 0.95454545
0.6969697 0.65909091 0.82575758 0.65909091]
mean value: 0.7588311688311689
key: train_roc_auc
value: [0.96025408 0.97413793 0.95499093 0.9430127 0.96887477 0.94383339
0.96610169 0.97457627 0.96078255 0.94698882]
mean value: 0.9593553143712085
key: test_jcc
value: [0.66666667 0.76923077 0.53846154 0.75 0.75 0.90909091
0.61538462 0.64285714 0.75 0.64285714]
mean value: 0.7034548784548784
key: train_jcc
value: [0.94949495 0.96938776 0.93939394 0.93069307 0.95918367 0.93
0.95918367 0.96907216 0.94897959 0.92929293]
mean value: 0.9484681746314754
MCC on Blind test: 0.61
Accuracy on Blind test: 0.8
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04003 0.03358912 0.06656361 0.0678463 0.10102797 0.07133985
0.07167172 0.07186866 0.08310914 0.07368183]
mean value: 0.06807281970977783
key: score_time
value: [0.01230335 0.0230813 0.02241945 0.02380037 0.02342224 0.01903343
0.03061485 0.0213964 0.02022529 0.01746082]
mean value: 0.021375751495361327
key: test_mcc
value: [0.52295779 0.71562645 0.4719399 0.71562645 0.62641448 0.82572282
0.44038551 0.52295779 0.71818182 0.42727273]
mean value: 0.5987085733914107
key: train_mcc
value: [0.8518477 0.80951848 0.83085028 0.79896965 0.83088812 0.84132139
0.8518477 0.83068309 0.84166312 0.89500244]
mean value: 0.8382591983762472
key: test_accuracy
value: [0.76190476 0.85714286 0.71428571 0.85714286 0.80952381 0.9047619
0.71428571 0.76190476 0.85714286 0.71428571]
mean value: 0.7952380952380952
key: train_accuracy
value: [0.92592593 0.9047619 0.91534392 0.8994709 0.91534392 0.92063492
0.92592593 0.91534392 0.92063492 0.94708995]
mean value: 0.919047619047619
key: test_fscore
value: [0.73684211 0.84210526 0.75 0.84210526 0.77777778 0.9
0.7 0.7826087 0.85714286 0.72727273]
mean value: 0.7915854689424483
key: train_fscore
value: [0.92631579 0.90526316 0.91666667 0.90052356 0.91489362 0.92063492
0.92553191 0.91489362 0.91891892 0.94791667]
mean value: 0.9191558829401187
key: test_precision
value: [0.77777778 0.88888889 0.64285714 0.88888889 0.875 1.
0.77777778 0.75 0.9 0.72727273]
mean value: 0.8228463203463203
key: train_precision
value: [0.92631579 0.90526316 0.90721649 0.89583333 0.92473118 0.91578947
0.92553191 0.91489362 0.93406593 0.92857143]
mean value: 0.917821232657928
key: test_recall
value: [0.7 0.8 0.9 0.8 0.7 0.81818182
0.63636364 0.81818182 0.81818182 0.72727273]
mean value: 0.7718181818181818
key: train_recall
value: [0.92631579 0.90526316 0.92631579 0.90526316 0.90526316 0.92553191
0.92553191 0.91489362 0.90425532 0.96808511]
mean value: 0.9206718924972005
key: test_roc_auc
value: [0.75909091 0.85454545 0.72272727 0.85454545 0.80454545 0.90909091
0.71818182 0.75909091 0.85909091 0.71363636]
mean value: 0.7954545454545454
key: train_roc_auc
value: [0.92592385 0.90475924 0.91528555 0.89944009 0.91539754 0.92066069
0.92592385 0.91534155 0.92054871 0.94720045]
mean value: 0.9190481522956326
key: test_jcc
value: [0.58333333 0.72727273 0.6 0.72727273 0.63636364 0.81818182
0.53846154 0.64285714 0.75 0.57142857]
mean value: 0.6595171495171496
key: train_jcc
value: [0.8627451 0.82692308 0.84615385 0.81904762 0.84313725 0.85294118
0.86138614 0.84313725 0.85 0.9009901 ]
mean value: 0.850646156406203
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.61443114 0.991431 1.31764698 1.33511066 1.55657887 1.8804636
2.35004854 1.59028721 1.81239128 2.00124836]
mean value: 1.644963765144348
key: score_time
value: [0.03566933 0.01466346 0.01258254 0.01850414 0.02412224 0.02417207
0.03879929 0.02366471 0.02081037 0.02138114]
mean value: 0.023436927795410158
key: test_mcc
value: [0.43007562 0.61818182 0.55161872 1. 0.80909091 0.82572282
0.55161872 0.62641448 0.90909091 0.55161872]
mean value: 0.6873432731232932
key: train_mcc
value: [1. 1. 1. 1. 1. 1.
1. 0.91555606 1. 1. ]
mean value: 0.9915556059051258
key: test_accuracy
value: [0.71428571 0.80952381 0.76190476 1. 0.9047619 0.9047619
0.76190476 0.80952381 0.95238095 0.76190476]
mean value: 0.8380952380952381
key: train_accuracy
value: [1. 1. 1. 1. 1. 1.
1. 0.95767196 1. 1. ]
mean value: 0.9957671957671957
key: test_fscore
value: [0.66666667 0.8 0.7826087 1. 0.9 0.9
0.73684211 0.83333333 0.95238095 0.73684211]
mean value: 0.8308673858559442
key: train_fscore
value: [1. 1. 1. 1. 1. 1.
1. 0.95789474 1. 1. ]
mean value: 0.9957894736842106
key: test_precision
value: [0.75 0.8 0.69230769 1. 0.9 1.
0.875 0.76923077 1. 0.875 ]
mean value: 0.8661538461538462
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 0.94791667 1. 1. ]
mean value: 0.9947916666666666
key: test_recall
value: [0.6 0.8 0.9 1. 0.9 0.81818182
0.63636364 0.90909091 0.90909091 0.63636364]
mean value: 0.8109090909090909
key: train_recall
value: [1. 1. 1. 1. 1. 1.
1. 0.96808511 1. 1. ]
mean value: 0.9968085106382979
key: test_roc_auc
value: [0.70909091 0.80909091 0.76818182 1. 0.90454545 0.90909091
0.76818182 0.80454545 0.95454545 0.76818182]
mean value: 0.8395454545454546
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1.
1. 0.95772676 1. 1. ]
mean value: 0.9957726763717805
key: test_jcc
value: [0.5 0.66666667 0.64285714 1. 0.81818182 0.81818182
0.58333333 0.71428571 0.90909091 0.58333333]
mean value: 0.7235930735930736
key: train_jcc
value: [1. 1. 1. 1. 1. 1.
1. 0.91919192 1. 1. ]
mean value: 0.9919191919191919
MCC on Blind test: 0.58
Accuracy on Blind test: 0.8
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.03653574 0.01316571 0.01315188 0.0131731 0.01319766 0.01305985
0.01309562 0.01336765 0.01304221 0.01303864]
mean value: 0.015482807159423828
key: score_time
value: [0.01301122 0.01235342 0.01248431 0.01237154 0.01234913 0.01239204
0.01234293 0.01240659 0.01235342 0.01222396]
mean value: 0.012428855895996094
key: test_mcc
value: [ 0.44038551 -0.13762047 0.39196475 0.33709993 0.52727273 0.71562645
0.14545455 0.33709993 0.45226702 0.52295779]
mean value: 0.37325081711851577
key: train_mcc
value: [0.55158352 0.46765481 0.54179779 0.46109894 0.53609614 0.47825095
0.44012799 0.55442155 0.4230863 0.52563909]
mean value: 0.4979757093671649
key: test_accuracy
value: [0.71428571 0.42857143 0.66666667 0.66666667 0.76190476 0.85714286
0.57142857 0.66666667 0.71428571 0.76190476]
mean value: 0.680952380952381
key: train_accuracy
value: [0.77248677 0.71957672 0.76719577 0.73015873 0.76719577 0.73544974
0.71957672 0.77248677 0.69312169 0.75661376]
mean value: 0.7433862433862434
key: test_fscore
value: [0.72727273 0.45454545 0.72 0.58823529 0.76190476 0.86956522
0.57142857 0.72 0.76923077 0.7826087 ]
mean value: 0.696479149154341
key: train_fscore
value: [0.7902439 0.76233184 0.78640777 0.72432432 0.77777778 0.75490196
0.70718232 0.7902439 0.74336283 0.77884615]
mean value: 0.7615622779466328
key: test_precision
value: [0.66666667 0.41666667 0.6 0.71428571 0.72727273 0.83333333
0.6 0.64285714 0.66666667 0.75 ]
mean value: 0.6617748917748918
key: train_precision
value: [0.73636364 0.6640625 0.72972973 0.74444444 0.74757282 0.7
0.73563218 0.72972973 0.63636364 0.71052632]
mean value: 0.7134424991862677
key: test_recall
value: [0.8 0.5 0.9 0.5 0.8 0.90909091
0.54545455 0.81818182 0.90909091 0.81818182]
mean value: 0.75
key: train_recall
value: [0.85263158 0.89473684 0.85263158 0.70526316 0.81052632 0.81914894
0.68085106 0.86170213 0.89361702 0.86170213]
mean value: 0.8232810750279955
key: test_roc_auc
value: [0.71818182 0.43181818 0.67727273 0.65909091 0.76363636 0.85454545
0.57272727 0.65909091 0.70454545 0.75909091]
mean value: 0.68
key: train_roc_auc
value: [0.77206047 0.71864502 0.76674132 0.73029115 0.76696529 0.73589026
0.7193729 0.77295633 0.69417693 0.75716685]
mean value: 0.7434266517357223
key: test_jcc
value: [0.57142857 0.29411765 0.5625 0.41666667 0.61538462 0.76923077
0.4 0.5625 0.625 0.64285714]
mean value: 0.5459685412626589
key: train_jcc
value: [0.65322581 0.61594203 0.648 0.56779661 0.63636364 0.60629921
0.54700855 0.65322581 0.5915493 0.63779528]
mean value: 0.6157206219394032
MCC on Blind test: 0.12
Accuracy on Blind test: 0.6
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01332068 0.01345658 0.01338673 0.01330256 0.01338434 0.01339674
0.01414132 0.01341271 0.01346302 0.01340342]
mean value: 0.013466811180114746
key: score_time
value: [0.0123136 0.01233149 0.01238203 0.01235437 0.01238656 0.01232052
0.02520466 0.01238585 0.01245117 0.0123682 ]
mean value: 0.013649845123291015
key: test_mcc
value: [0.23373675 0.42817442 0.14545455 0.45226702 0.42817442 0.55161872
0.06741999 0.23636364 0.55161872 0.60302269]
mean value: 0.36978509083764477
key: train_mcc
value: [0.55646909 0.49995455 0.56569532 0.45906255 0.53448943 0.46424351
0.54251375 0.55585218 0.49572783 0.4861571 ]
mean value: 0.5160165310554382
key: test_accuracy
value: [0.61904762 0.66666667 0.57142857 0.71428571 0.66666667 0.76190476
0.52380952 0.61904762 0.76190476 0.76190476]
mean value: 0.6666666666666666
key: train_accuracy
value: [0.76719577 0.74074074 0.77248677 0.71957672 0.75661376 0.72486772
0.76190476 0.77248677 0.73544974 0.73544974]
mean value: 0.7486772486772486
key: test_fscore
value: [0.55555556 0.46153846 0.57142857 0.625 0.46153846 0.73684211
0.44444444 0.63636364 0.73684211 0.70588235]
mean value: 0.5935435694336623
key: train_fscore
value: [0.73170732 0.7030303 0.73939394 0.67484663 0.7195122 0.68292683
0.72392638 0.74556213 0.6835443 0.69512195]
mean value: 0.7099571975217122
key: test_precision
value: [0.625 1. 0.54545455 0.83333333 1. 0.875
0.57142857 0.63636364 0.875 1. ]
mean value: 0.7961580086580087
key: train_precision
value: [0.86956522 0.82857143 0.87142857 0.80882353 0.85507246 0.8
0.85507246 0.84 0.84375 0.81428571]
mean value: 0.8386569388625016
key: test_recall
value: [0.5 0.3 0.6 0.5 0.3 0.63636364
0.36363636 0.63636364 0.63636364 0.54545455]
mean value: 0.5018181818181818
key: train_recall
value: [0.63157895 0.61052632 0.64210526 0.57894737 0.62105263 0.59574468
0.62765957 0.67021277 0.57446809 0.60638298]
mean value: 0.6158678611422173
key: test_roc_auc
value: [0.61363636 0.65 0.57272727 0.70454545 0.65 0.76818182
0.53181818 0.61818182 0.76818182 0.77272727]
mean value: 0.665
key: train_roc_auc
value: [0.76791713 0.74143337 0.77318029 0.72032475 0.75733483 0.72418813
0.76119821 0.77194849 0.73460246 0.73477044]
mean value: 0.7486898096304592
key: test_jcc
value: [0.38461538 0.3 0.4 0.45454545 0.3 0.58333333
0.28571429 0.46666667 0.58333333 0.54545455]
mean value: 0.43036630036630036
key: train_jcc
value: [0.57692308 0.54205607 0.58653846 0.50925926 0.56190476 0.51851852
0.56730769 0.59433962 0.51923077 0.53271028]
mean value: 0.5508788517464236
MCC on Blind test: 0.17
Accuracy on Blind test: 0.6
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01281309 0.01259708 0.01266217 0.0125277 0.01282144 0.01259828
0.01269078 0.01281691 0.01241755 0.0138123 ]
mean value: 0.012775731086730958
key: score_time
value: [0.02045655 0.03969097 0.05234361 0.0366714 0.03734803 0.03721189
0.03660083 0.03660321 0.0360918 0.05223918]
mean value: 0.038525748252868655
key: test_mcc
value: [-0.06741999 0.23373675 0.03015113 0.33636364 0.13858047 0.14545455
-0.26593594 0.42727273 0.33636364 0.18090681]
mean value: 0.14954737717597608
key: train_mcc
value: [0.57375166 0.50637592 0.61142844 0.55051844 0.49793339 0.55594205
0.6099783 0.5498651 0.55978224 0.55189788]
mean value: 0.5567473416942506
key: test_accuracy
value: [0.47619048 0.61904762 0.52380952 0.66666667 0.57142857 0.57142857
0.38095238 0.71428571 0.66666667 0.57142857]
mean value: 0.5761904761904761
key: train_accuracy
value: [0.78306878 0.75132275 0.8042328 0.77248677 0.74603175 0.77777778
0.8042328 0.77248677 0.77777778 0.76719577]
mean value: 0.7756613756613756
key: test_fscore
value: [0.35294118 0.55555556 0.375 0.66666667 0.4 0.57142857
0.13333333 0.72727273 0.66666667 0.47058824]
mean value: 0.49194529326882264
key: train_fscore
value: [0.76571429 0.73743017 0.79558011 0.75706215 0.72727273 0.77173913
0.79558011 0.75428571 0.76136364 0.73170732]
mean value: 0.7597735346629213
key: test_precision
value: [0.42857143 0.625 0.5 0.63636364 0.6 0.6
0.25 0.72727273 0.7 0.66666667]
mean value: 0.5733874458874458
key: train_precision
value: [0.8375 0.78571429 0.8372093 0.81707317 0.79012346 0.78888889
0.82758621 0.81481481 0.81707317 0.85714286]
mean value: 0.8173126154036517
key: test_recall
value: [0.3 0.5 0.3 0.7 0.3 0.54545455
0.09090909 0.72727273 0.63636364 0.36363636]
mean value: 0.44636363636363635
key: train_recall
value: [0.70526316 0.69473684 0.75789474 0.70526316 0.67368421 0.75531915
0.76595745 0.70212766 0.71276596 0.63829787]
mean value: 0.7111310190369541
key: test_roc_auc
value: [0.46818182 0.61363636 0.51363636 0.66818182 0.55909091 0.57272727
0.39545455 0.71363636 0.66818182 0.58181818]
mean value: 0.5754545454545454
key: train_roc_auc
value: [0.78348264 0.75162374 0.80447928 0.77284434 0.74641657 0.77765957
0.80403135 0.77211646 0.77743561 0.76651736]
mean value: 0.7756606942889137
key: test_jcc
value: [0.21428571 0.38461538 0.23076923 0.5 0.25 0.4
0.07142857 0.57142857 0.5 0.30769231]
mean value: 0.34302197802197804
key: train_jcc
value: [0.62037037 0.5840708 0.66055046 0.60909091 0.57142857 0.62831858
0.66055046 0.60550459 0.6146789 0.57692308]
mean value: 0.6131486712013626
MCC on Blind test: 0.05
Accuracy on Blind test: 0.53
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01701498 0.01680541 0.01711583 0.01694584 0.01700974 0.01702642
0.01688266 0.0170269 0.01668429 0.01705098]
mean value: 0.016956305503845213
key: score_time
value: [0.01340604 0.01336288 0.01338744 0.01337409 0.01337981 0.01351428
0.01340461 0.01340222 0.01311612 0.01343751]
mean value: 0.013378500938415527
key: test_mcc
value: [0.13762047 0.24120908 0.44038551 0.23373675 0.36244122 0.71818182
0.14545455 0.42727273 0.61818182 0.52727273]
mean value: 0.38517566540238385
key: train_mcc
value: [0.73654755 0.68655917 0.75666293 0.73549832 0.69804157 0.73867014
0.75694773 0.81109216 0.75994222 0.76164115]
mean value: 0.7441602942290075
key: test_accuracy
value: [0.57142857 0.61904762 0.71428571 0.61904762 0.66666667 0.85714286
0.57142857 0.71428571 0.80952381 0.76190476]
mean value: 0.6904761904761905
key: train_accuracy
value: [0.86772487 0.84126984 0.87830688 0.86772487 0.84656085 0.86772487
0.87830688 0.9047619 0.87830688 0.87830688]
mean value: 0.8708994708994708
key: test_fscore
value: [0.52631579 0.5 0.72727273 0.55555556 0.53333333 0.85714286
0.57142857 0.72727273 0.81818182 0.76190476]
mean value: 0.6578408141566037
key: train_fscore
value: [0.86486486 0.83333333 0.87830688 0.86772487 0.83798883 0.8603352
0.87567568 0.9010989 0.87150838 0.8700565 ]
mean value: 0.86608934204143
key: test_precision
value: [0.55555556 0.66666667 0.66666667 0.625 0.8 0.9
0.6 0.72727273 0.81818182 0.8 ]
mean value: 0.7159343434343435
key: train_precision
value: [0.88888889 0.88235294 0.88297872 0.87234043 0.89285714 0.90588235
0.89010989 0.93181818 0.91764706 0.92771084]
mean value: 0.8992586448924944
key: test_recall
value: [0.5 0.4 0.8 0.5 0.4 0.81818182
0.54545455 0.72727273 0.81818182 0.72727273]
mean value: 0.6236363636363637
key: train_recall
value: [0.84210526 0.78947368 0.87368421 0.86315789 0.78947368 0.81914894
0.86170213 0.87234043 0.82978723 0.81914894]
mean value: 0.8360022396416573
key: test_roc_auc
value: [0.56818182 0.60909091 0.71818182 0.61363636 0.65454545 0.85909091
0.57272727 0.71363636 0.80909091 0.76363636]
mean value: 0.6881818181818182
key: train_roc_auc
value: [0.86786114 0.84154535 0.87833147 0.86774916 0.8468645 0.8674692
0.87821948 0.90459127 0.87805151 0.87799552]
mean value: 0.8708678611422173
key: test_jcc
value: [0.35714286 0.33333333 0.57142857 0.38461538 0.36363636 0.75
0.4 0.57142857 0.69230769 0.61538462]
mean value: 0.5039277389277389
key: train_jcc
value: [0.76190476 0.71428571 0.78301887 0.76635514 0.72115385 0.75490196
0.77884615 0.82 0.77227723 0.77 ]
mean value: 0.7642743672809006
MCC on Blind test: 0.29
Accuracy on Blind test: 0.67
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.34039497 1.35265207 1.41161752 1.35997891 1.55491352 1.25403571
1.12641239 1.19176412 0.80105114 0.87126184]
mean value: 1.2264082193374635
key: score_time
value: [0.01979089 0.02390909 0.03533006 0.0225184 0.02177119 0.02182436
0.01981211 0.01836801 0.01252675 0.01538301]
mean value: 0.021123385429382323
key: test_mcc
value: [0.23636364 0.61818182 0.63305416 0.52727273 0.80909091 0.67419986
0.42727273 0.53935989 0.71818182 0.67419986]
mean value: 0.5857177416208371
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61904762 0.80952381 0.80952381 0.76190476 0.9047619 0.80952381
0.71428571 0.76190476 0.85714286 0.80952381]
mean value: 0.7857142857142857
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.6 0.8 0.81818182 0.76190476 0.9 0.77777778
0.72727273 0.8 0.85714286 0.77777778]
mean value: 0.782005772005772
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6 0.8 0.75 0.72727273 0.9 1.
0.72727273 0.71428571 0.9 1. ]
mean value: 0.8118831168831169
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6 0.8 0.9 0.8 0.9 0.63636364
0.72727273 0.90909091 0.81818182 0.63636364]
mean value: 0.7727272727272727
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.61818182 0.80909091 0.81363636 0.76363636 0.90454545 0.81818182
0.71363636 0.75454545 0.85909091 0.81818182]
mean value: 0.7872727272727272
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.42857143 0.66666667 0.69230769 0.61538462 0.81818182 0.63636364
0.57142857 0.66666667 0.75 0.63636364]
mean value: 0.6481934731934732
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.29
Accuracy on Blind test: 0.67
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01981091 0.0168035 0.01414537 0.01470256 0.01423645 0.01411271
0.01426768 0.01455116 0.01446676 0.01585817]
mean value: 0.01529552936553955
key: score_time
value: [0.01231694 0.0095036 0.00918722 0.00877905 0.00922728 0.00901771
0.00886106 0.00959349 0.00907612 0.00917816]
mean value: 0.009474062919616699
key: test_mcc
value: [0.80909091 1. 0.55161872 0.53935989 1. 0.90909091
0.80909091 0.71562645 0.90909091 0.90829511]
mean value: 0.8151263804000601
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9047619 1. 0.76190476 0.76190476 1. 0.95238095
0.9047619 0.85714286 0.95238095 0.95238095]
mean value: 0.9047619047619048
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.9 1. 0.7826087 0.70588235 1. 0.95238095
0.90909091 0.86956522 0.95238095 0.95652174]
mean value: 0.9028430818967903
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9 1. 0.69230769 0.85714286 1. 1.
0.90909091 0.83333333 1. 0.91666667]
mean value: 0.9108541458541458
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9 1. 0.9 0.6 1. 0.90909091
0.90909091 0.90909091 0.90909091 1. ]
mean value: 0.9036363636363636
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90454545 1. 0.76818182 0.75454545 1. 0.95454545
0.90454545 0.85454545 0.95454545 0.95 ]
mean value: 0.9045454545454545
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.81818182 1. 0.64285714 0.54545455 1. 0.90909091
0.83333333 0.76923077 0.90909091 0.91666667]
mean value: 0.8343906093906094
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.09894037 0.09839034 0.09680152 0.09823012 0.09769559 0.09740853
0.09786224 0.09949851 0.09909749 0.09787583]
mean value: 0.09818005561828613
key: score_time
value: [0.01770234 0.01870775 0.01915932 0.01832509 0.01789999 0.01840591
0.01811194 0.0182302 0.01793718 0.01793575]
mean value: 0.01824154853820801
key: test_mcc
value: [0.44038551 0.62641448 0.60302269 0.42727273 0.90829511 0.80909091
0.44038551 0.53935989 0.71818182 0.82572282]
mean value: 0.6338131459152349
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.71428571 0.80952381 0.76190476 0.71428571 0.95238095 0.9047619
0.71428571 0.76190476 0.85714286 0.9047619 ]
mean value: 0.8095238095238095
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.72727273 0.77777778 0.8 0.7 0.94736842 0.90909091
0.7 0.8 0.85714286 0.9 ]
mean value: 0.8118652692336903
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.875 0.66666667 0.7 1. 0.90909091
0.77777778 0.71428571 0.9 1. ]
mean value: 0.8209487734487735
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.7 1. 0.7 0.9 0.90909091
0.63636364 0.90909091 0.81818182 0.81818182]
mean value: 0.8190909090909091
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.71818182 0.80454545 0.77272727 0.71363636 0.95 0.90454545
0.71818182 0.75454545 0.85909091 0.90909091]
mean value: 0.8104545454545454
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.57142857 0.63636364 0.66666667 0.53846154 0.9 0.83333333
0.53846154 0.66666667 0.75 0.81818182]
mean value: 0.6919563769563769
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.0100956 0.01037574 0.00979137 0.00928307 0.00931144 0.00919366
0.01059389 0.01200008 0.00972986 0.01077032]
mean value: 0.010114502906799317
key: score_time
value: [0.00959349 0.00903225 0.00895309 0.00887442 0.00888252 0.00879502
0.01142311 0.01174235 0.01009965 0.00888896]
mean value: 0.009628486633300782
key: test_mcc
value: [0.13762047 0.23373675 0.24771685 0.23636364 0.53935989 0.53935989
0.06741999 0.52295779 0.82572282 0.71818182]
mean value: 0.40684398983090986
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.57142857 0.61904762 0.61904762 0.61904762 0.76190476 0.76190476
0.52380952 0.76190476 0.9047619 0.85714286]
mean value: 0.7
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.52631579 0.55555556 0.63636364 0.6 0.70588235 0.8
0.44444444 0.7826087 0.9 0.85714286]
mean value: 0.6808313331573528
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.55555556 0.625 0.58333333 0.6 0.85714286 0.71428571
0.57142857 0.75 1. 0.9 ]
mean value: 0.7156746031746032
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.5 0.5 0.7 0.6 0.6 0.90909091
0.36363636 0.81818182 0.81818182 0.81818182]
mean value: 0.6627272727272727
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.56818182 0.61363636 0.62272727 0.61818182 0.75454545 0.75454545
0.53181818 0.75909091 0.90909091 0.85909091]
mean value: 0.6990909090909091
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.35714286 0.38461538 0.46666667 0.42857143 0.54545455 0.66666667
0.28571429 0.64285714 0.81818182 0.75 ]
mean value: 0.5345870795870796
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.11
Accuracy on Blind test: 0.47
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.31620884 1.31651926 1.31475592 1.30594683 1.28883219 1.36383224
1.30705667 1.3408339 1.34590888 1.31465459]
mean value: 1.321454930305481
key: score_time
value: [0.09206462 0.09813571 0.09290099 0.09021497 0.09371018 0.09852195
0.09075403 0.09710979 0.09322071 0.09802985]
mean value: 0.09446628093719482
key: test_mcc
value: [0.52295779 0.71562645 0.39196475 0.52295779 0.90909091 1.
0.71818182 0.71562645 1. 0.61818182]
mean value: 0.7114587764939949
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.76190476 0.85714286 0.66666667 0.76190476 0.95238095 1.
0.85714286 0.85714286 1. 0.80952381]
mean value: 0.8523809523809524
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.73684211 0.84210526 0.72 0.73684211 0.95238095 1.
0.85714286 0.86956522 1. 0.81818182]
mean value: 0.8533060318781143
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.77777778 0.88888889 0.6 0.77777778 0.90909091 1.
0.9 0.83333333 1. 0.81818182]
mean value: 0.8505050505050505
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.7 0.8 0.9 0.7 1. 1.
0.81818182 0.90909091 1. 0.81818182]
mean value: 0.8645454545454545
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75909091 0.85454545 0.67727273 0.75909091 0.95454545 1.
0.85909091 0.85454545 1. 0.80909091]
mean value: 0.8527272727272728
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.58333333 0.72727273 0.5625 0.58333333 0.90909091 1.
0.75 0.76923077 1. 0.69230769]
mean value: 0.7577068764568765
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.85998034 1.44642282 1.04460621 0.97317791 0.90930486 0.89995694
0.87365675 0.88928413 0.94170761 0.95756435]
mean value: 0.9795661926269531
key: score_time
value: [0.2091279 0.2227211 0.17214561 0.14424253 0.16997743 0.16527343
0.17143083 0.20080829 0.17865419 0.12289047]
mean value: 0.17572717666625975
key: test_mcc
value: [0.52295779 0.53935989 0.39196475 0.62641448 0.80909091 0.82275335
0.52727273 0.61818182 1. 0.71818182]
mean value: 0.6576177533597088
key: train_mcc
value: [0.95788064 0.95788064 0.96830553 0.95788064 0.95788064 0.94757483
0.97905701 0.93736014 0.95789003 0.95789003]
mean value: 0.9579600124491399
key: test_accuracy
value: [0.76190476 0.76190476 0.66666667 0.80952381 0.9047619 0.9047619
0.76190476 0.80952381 1. 0.85714286]
mean value: 0.8238095238095238
key: train_accuracy
value: [0.97883598 0.97883598 0.98412698 0.97883598 0.97883598 0.97354497
0.98941799 0.96825397 0.97883598 0.97883598]
mean value: 0.9788359788359788
key: test_fscore
value: [0.73684211 0.70588235 0.72 0.77777778 0.9 0.91666667
0.76190476 0.81818182 1. 0.85714286]
mean value: 0.8194398339878216
key: train_fscore
value: [0.97916667 0.97916667 0.98429319 0.97916667 0.97916667 0.97382199
0.98947368 0.96875 0.97894737 0.97894737]
mean value: 0.9790900270965371
key: test_precision
value: [0.77777778 0.85714286 0.6 0.875 0.9 0.84615385
0.8 0.81818182 1. 0.9 ]
mean value: 0.83742562992563
key: train_precision
value: [0.96907216 0.96907216 0.97916667 0.96907216 0.96907216 0.95876289
0.97916667 0.94897959 0.96875 0.96875 ]
mean value: 0.9679864471561821
key: test_recall
value: [0.7 0.6 0.9 0.7 0.9 1.
0.72727273 0.81818182 1. 0.81818182]
mean value: 0.8163636363636364
key: train_recall
value: [0.98947368 0.98947368 0.98947368 0.98947368 0.98947368 0.9893617
1. 0.9893617 0.9893617 0.9893617 ]
mean value: 0.990481522956327
key: test_roc_auc
value: [0.75909091 0.75454545 0.67727273 0.80454545 0.90454545 0.9
0.76363636 0.80909091 1. 0.85909091]
mean value: 0.8231818181818182
key: train_roc_auc
value: [0.9787794 0.9787794 0.98409854 0.9787794 0.9787794 0.97362822
0.98947368 0.96836506 0.97889138 0.97889138]
mean value: 0.9788465845464726
key: test_jcc
value: [0.58333333 0.54545455 0.5625 0.63636364 0.81818182 0.84615385
0.61538462 0.69230769 1. 0.75 ]
mean value: 0.7049679487179488
key: train_jcc
value: [0.95918367 0.95918367 0.96907216 0.95918367 0.95918367 0.94897959
0.97916667 0.93939394 0.95876289 0.95876289]
mean value: 0.9590872829919221
MCC on Blind test: 0.58
Accuracy on Blind test: 0.8
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01177764 0.01067424 0.01179218 0.01120353 0.01094937 0.01100659
0.01069164 0.01066709 0.01039362 0.01051068]
mean value: 0.010966658592224121
key: score_time
value: [0.01029491 0.00953531 0.00973797 0.01007605 0.01004219 0.00984097
0.01000714 0.01029038 0.009624 0.01030111]
mean value: 0.009975004196166991
key: test_mcc
value: [0.23373675 0.42817442 0.14545455 0.45226702 0.42817442 0.55161872
0.06741999 0.23636364 0.55161872 0.60302269]
mean value: 0.36978509083764477
key: train_mcc
value: [0.55646909 0.49995455 0.56569532 0.45906255 0.53448943 0.46424351
0.54251375 0.55585218 0.49572783 0.4861571 ]
mean value: 0.5160165310554382
key: test_accuracy
value: [0.61904762 0.66666667 0.57142857 0.71428571 0.66666667 0.76190476
0.52380952 0.61904762 0.76190476 0.76190476]
mean value: 0.6666666666666666
key: train_accuracy
value: [0.76719577 0.74074074 0.77248677 0.71957672 0.75661376 0.72486772
0.76190476 0.77248677 0.73544974 0.73544974]
mean value: 0.7486772486772486
key: test_fscore
value: [0.55555556 0.46153846 0.57142857 0.625 0.46153846 0.73684211
0.44444444 0.63636364 0.73684211 0.70588235]
mean value: 0.5935435694336623
key: train_fscore
value: [0.73170732 0.7030303 0.73939394 0.67484663 0.7195122 0.68292683
0.72392638 0.74556213 0.6835443 0.69512195]
mean value: 0.7099571975217122
key: test_precision
value: [0.625 1. 0.54545455 0.83333333 1. 0.875
0.57142857 0.63636364 0.875 1. ]
mean value: 0.7961580086580087
key: train_precision
value: [0.86956522 0.82857143 0.87142857 0.80882353 0.85507246 0.8
0.85507246 0.84 0.84375 0.81428571]
mean value: 0.8386569388625016
key: test_recall
value: [0.5 0.3 0.6 0.5 0.3 0.63636364
0.36363636 0.63636364 0.63636364 0.54545455]
mean value: 0.5018181818181818
key: train_recall
value: [0.63157895 0.61052632 0.64210526 0.57894737 0.62105263 0.59574468
0.62765957 0.67021277 0.57446809 0.60638298]
mean value: 0.6158678611422173
key: test_roc_auc
value: [0.61363636 0.65 0.57272727 0.70454545 0.65 0.76818182
0.53181818 0.61818182 0.76818182 0.77272727]
mean value: 0.665
key: train_roc_auc
value: [0.76791713 0.74143337 0.77318029 0.72032475 0.75733483 0.72418813
0.76119821 0.77194849 0.73460246 0.73477044]
mean value: 0.7486898096304592
key: test_jcc
value: [0.38461538 0.3 0.4 0.45454545 0.3 0.58333333
0.28571429 0.46666667 0.58333333 0.54545455]
mean value: 0.43036630036630036
key: train_jcc
value: [0.57692308 0.54205607 0.58653846 0.50925926 0.56190476 0.51851852
0.56730769 0.59433962 0.51923077 0.53271028]
mean value: 0.5508788517464236
MCC on Blind test: 0.17
Accuracy on Blind test: 0.6
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [1.0007422 1.08978605 0.60352254 0.19705272 0.85835528 0.39761066
0.57176566 0.29660773 0.19528532 0.50309563]
mean value: 0.5713823795318603
key: score_time
value: [0.013587 0.01366115 0.0136435 0.01247692 0.01475835 0.01236415
0.01414704 0.01221514 0.01440883 0.01542115]
mean value: 0.013668322563171386
key: test_mcc
value: [0.82275335 1. 0.39196475 0.80909091 0.90909091 1.
0.80909091 0.90829511 1. 0.71818182]
mean value: 0.8368467750842328
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9047619 1. 0.66666667 0.9047619 0.95238095 1.
0.9047619 0.95238095 1. 0.85714286]
mean value: 0.9142857142857143
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 1. 0.72 0.9 0.95238095 1.
0.90909091 0.95652174 1. 0.85714286]
mean value: 0.9184025346634043
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.6 0.9 0.90909091 1.
0.90909091 0.91666667 1. 0.9 ]
mean value: 0.9134848484848485
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 1. 0.9 0.9 1. 1.
0.90909091 1. 1. 0.81818182]
mean value: 0.9327272727272727
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9 1. 0.67727273 0.90454545 0.95454545 1.
0.90454545 0.95 1. 0.85909091]
mean value: 0.915
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 1. 0.5625 0.81818182 0.90909091 1.
0.83333333 0.91666667 1. 0.75 ]
mean value: 0.8589772727272728
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.03846598 0.02921033 0.0438509 0.10852194 0.03383636 0.02990055
0.03976679 0.04133964 0.06952095 0.03217459]
mean value: 0.046658802032470706
key: score_time
value: [0.04556179 0.01182222 0.03038454 0.01200247 0.01077509 0.01119876
0.01961231 0.01819873 0.01630306 0.01722431]
mean value: 0.01930832862854004
key: test_mcc
value: [0.43007562 0.52727273 0.33028913 0.71818182 0.74161985 0.71818182
0.33028913 0.80909091 0.82572282 0.63305416]
mean value: 0.6063777984708973
key: train_mcc
value: [0.96830553 0.96874655 0.97905237 0.96830907 0.96830907 0.95767077
0.97883539 0.96830907 0.94713854 0.96830553]
mean value: 0.9672981891345079
key: test_accuracy
value: [0.71428571 0.76190476 0.66666667 0.85714286 0.85714286 0.85714286
0.66666667 0.9047619 0.9047619 0.80952381]
mean value: 0.7999999999999999
key: train_accuracy
value: [0.98412698 0.98412698 0.98941799 0.98412698 0.98412698 0.97883598
0.98941799 0.98412698 0.97354497 0.98412698]
mean value: 0.9835978835978836
key: test_fscore
value: [0.66666667 0.76190476 0.63157895 0.85714286 0.82352941 0.85714286
0.69565217 0.90909091 0.9 0.8 ]
mean value: 0.7902708584994222
key: train_fscore
value: [0.98429319 0.98395722 0.98958333 0.98412698 0.98412698 0.9787234
0.9893617 0.98412698 0.97326203 0.98395722]
mean value: 0.9835519056402777
key: test_precision
value: [0.75 0.72727273 0.66666667 0.81818182 1. 0.9
0.66666667 0.90909091 1. 0.88888889]
mean value: 0.8326767676767677
key: train_precision
value: [0.97916667 1. 0.97938144 0.9893617 0.9893617 0.9787234
0.9893617 0.97894737 0.97849462 0.98924731]
mean value: 0.9852045924508858
key: test_recall
value: [0.6 0.8 0.6 0.9 0.7 0.81818182
0.72727273 0.90909091 0.81818182 0.72727273]
mean value: 0.76
key: train_recall
value: [0.98947368 0.96842105 1. 0.97894737 0.97894737 0.9787234
0.9893617 0.9893617 0.96808511 0.9787234 ]
mean value: 0.9820044792833147
key: test_roc_auc
value: [0.70909091 0.76363636 0.66363636 0.85909091 0.85 0.85909091
0.66363636 0.90454545 0.90909091 0.81363636]
mean value: 0.7995454545454546
key: train_roc_auc
value: [0.98409854 0.98421053 0.9893617 0.98415454 0.98415454 0.97883539
0.98941769 0.98415454 0.97351624 0.98409854]
mean value: 0.9836002239641657
key: test_jcc
value: [0.5 0.61538462 0.46153846 0.75 0.7 0.75
0.53333333 0.83333333 0.81818182 0.66666667]
mean value: 0.6628438228438228
key: train_jcc
value: [0.96907216 0.96842105 0.97938144 0.96875 0.96875 0.95833333
0.97894737 0.96875 0.94791667 0.96842105]
mean value: 0.9676743081931634
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01121068 0.01103139 0.01107359 0.01116538 0.0111959 0.0111444
0.01113462 0.0111568 0.01111794 0.01115346]
mean value: 0.011138415336608887
key: score_time
value: [0.0101912 0.01027894 0.01037192 0.01032901 0.01054072 0.01028752
0.01039696 0.01047206 0.01040912 0.01026344]
mean value: 0.010354089736938476
key: test_mcc
value: [0.33028913 0.05504819 0.39196475 0.43007562 0.44038551 0.63305416
0.05504819 0.24120908 0.52295779 0.23636364]
mean value: 0.3336396040864597
key: train_mcc
value: [0.43065616 0.50382186 0.43289183 0.41958895 0.45044462 0.43991059
0.42961362 0.49404873 0.47201413 0.46267525]
mean value: 0.45356657385003807
key: test_accuracy
value: [0.66666667 0.52380952 0.66666667 0.71428571 0.71428571 0.80952381
0.52380952 0.61904762 0.76190476 0.61904762]
mean value: 0.6619047619047619
key: train_accuracy
value: [0.71428571 0.75132275 0.71428571 0.70899471 0.72486772 0.71957672
0.71428571 0.74603175 0.73544974 0.73015873]
mean value: 0.7259259259259259
key: test_fscore
value: [0.63157895 0.54545455 0.72 0.66666667 0.72727273 0.8
0.5 0.69230769 0.7826087 0.63636364]
mean value: 0.6702252911085863
key: train_fscore
value: [0.73 0.76142132 0.73529412 0.72361809 0.73469388 0.7253886
0.72164948 0.75510204 0.74226804 0.74111675]
mean value: 0.7370552324342122
key: test_precision
value: [0.66666667 0.5 0.6 0.75 0.66666667 0.88888889
0.55555556 0.6 0.75 0.63636364]
mean value: 0.6614141414141415
key: train_precision
value: [0.6952381 0.73529412 0.68807339 0.69230769 0.71287129 0.70707071
0.7 0.7254902 0.72 0.70873786]
mean value: 0.7085083354043781
key: test_recall
value: [0.6 0.6 0.9 0.6 0.8 0.72727273
0.45454545 0.81818182 0.81818182 0.63636364]
mean value: 0.6954545454545454
key: train_recall
value: [0.76842105 0.78947368 0.78947368 0.75789474 0.75789474 0.74468085
0.74468085 0.78723404 0.76595745 0.77659574]
mean value: 0.7682306830907055
key: test_roc_auc
value: [0.66363636 0.52727273 0.67727273 0.70909091 0.71818182 0.81363636
0.52727273 0.60909091 0.75909091 0.61818182]
mean value: 0.6622727272727273
key: train_roc_auc
value: [0.71399776 0.75111982 0.71388578 0.7087346 0.72469205 0.71970885
0.71444569 0.7462486 0.7356103 0.73040314]
mean value: 0.7258846584546472
key: test_jcc
value: [0.46153846 0.375 0.5625 0.5 0.57142857 0.66666667
0.33333333 0.52941176 0.64285714 0.46666667]
mean value: 0.5109402607196725
key: train_jcc
value: [0.57480315 0.6147541 0.58139535 0.56692913 0.58064516 0.56910569
0.56451613 0.60655738 0.59016393 0.58870968]
mean value: 0.5837579700936688
MCC on Blind test: 0.12
Accuracy on Blind test: 0.6
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01644301 0.01346326 0.01460147 0.01373553 0.01718259 0.01639462
0.01637864 0.01489592 0.01578617 0.01493788]
mean value: 0.015381908416748047
key: score_time
value: [0.0110271 0.01079273 0.01092768 0.01038527 0.01035404 0.01038051
0.01045108 0.01053333 0.01052833 0.0103631 ]
mean value: 0.01057431697845459
key: test_mcc
value: [0.4719399 0.61818182 0.4719399 0.74161985 0.71562645 0.60302269
0.23373675 0.71562645 0.74161985 0.60302269]
mean value: 0.5916336343526927
key: train_mcc
value: [0.86125076 0.84693232 0.84923609 0.66188185 0.95789003 0.87061974
0.8157737 0.89595041 0.80682683 0.88957791]
mean value: 0.8455939633920262
key: test_accuracy
value: [0.71428571 0.80952381 0.71428571 0.85714286 0.85714286 0.76190476
0.61904762 0.85714286 0.85714286 0.76190476]
mean value: 0.780952380952381
key: train_accuracy
value: [0.92592593 0.92063492 0.92063492 0.8042328 0.97883598 0.93121693
0.8994709 0.94708995 0.89417989 0.94179894]
mean value: 0.9164021164021163
key: test_fscore
value: [0.75 0.8 0.75 0.82352941 0.84210526 0.70588235
0.66666667 0.86956522 0.88 0.70588235]
mean value: 0.7793631264862925
key: train_fscore
value: [0.93137255 0.92537313 0.92610837 0.75816993 0.9787234 0.92571429
0.90821256 0.94505495 0.90384615 0.93785311]
mean value: 0.9140428448974536
key: test_precision
value: [0.64285714 0.8 0.64285714 1. 0.88888889 1.
0.61538462 0.83333333 0.78571429 1. ]
mean value: 0.8209035409035409
key: train_precision
value: [0.87155963 0.87735849 0.87037037 1. 0.98924731 1.
0.83185841 0.97727273 0.8245614 1. ]
mean value: 0.9242228343653033
key: test_recall
value: [0.9 0.8 0.9 0.7 0.8 0.54545455
0.72727273 0.90909091 1. 0.54545455]
mean value: 0.7827272727272727
key: train_recall
value: [1. 0.97894737 0.98947368 0.61052632 0.96842105 0.86170213
1. 0.91489362 1. 0.88297872]
mean value: 0.9206942889137738
key: test_roc_auc
value: [0.72272727 0.80909091 0.72272727 0.85 0.85454545 0.77272727
0.61363636 0.85454545 0.85 0.77272727]
mean value: 0.7822727272727272
key: train_roc_auc
value: [0.92553191 0.92032475 0.92026876 0.80526316 0.97889138 0.93085106
0.9 0.94692049 0.89473684 0.94148936]
mean value: 0.916427771556551
key: test_jcc
value: [0.6 0.66666667 0.6 0.7 0.72727273 0.54545455
0.5 0.76923077 0.78571429 0.54545455]
mean value: 0.6439793539793539
key: train_jcc
value: [0.87155963 0.86111111 0.86238532 0.61052632 0.95833333 0.86170213
0.83185841 0.89583333 0.8245614 0.88297872]
mean value: 0.846084970934794
MCC on Blind test: 0.49
Accuracy on Blind test: 0.67
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0150187 0.01364851 0.01349545 0.01297951 0.01252437 0.01285005
0.01290178 0.0133462 0.01365423 0.01284862]
mean value: 0.013326740264892578
key: score_time
value: [0.01063037 0.01029801 0.01048899 0.01053691 0.01031756 0.0104537
0.01043105 0.01037478 0.01037979 0.01044369]
mean value: 0.01043548583984375
key: test_mcc
value: [0.33709993 0.53935989 0.53935989 0.38924947 0.82572282 0.50874702
0.42727273 0.66332496 1. 0.52727273]
mean value: 0.5757409438784044
key: train_mcc
value: [0.80452249 0.84518345 0.79793785 0.47083798 0.80904214 0.61283493
0.85498064 0.88405964 0.88405964 0.89601922]
mean value: 0.7859477973314178
key: test_accuracy
value: [0.66666667 0.76190476 0.76190476 0.61904762 0.9047619 0.71428571
0.71428571 0.80952381 1. 0.76190476]
mean value: 0.7714285714285715
key: train_accuracy
value: [0.8994709 0.92063492 0.88888889 0.68253968 0.8994709 0.77248677
0.92592593 0.94179894 0.94179894 0.94708995]
mean value: 0.882010582010582
key: test_fscore
value: [0.58823529 0.70588235 0.70588235 0.71428571 0.90909091 0.78571429
0.72727273 0.84615385 1. 0.76190476]
mean value: 0.7744422244422244
key: train_fscore
value: [0.89385475 0.91712707 0.87573964 0.76 0.90731707 0.81385281
0.92857143 0.94240838 0.94240838 0.94845361]
mean value: 0.892973314316607
key: test_precision
value: [0.71428571 0.85714286 0.85714286 0.55555556 0.83333333 0.64705882
0.72727273 0.73333333 1. 0.8 ]
mean value: 0.7725125201595789
key: train_precision
value: [0.95238095 0.96511628 1. 0.61290323 0.84545455 0.68613139
0.89215686 0.92783505 0.92783505 0.92 ]
mean value: 0.8729813355410913
key: test_recall
value: [0.5 0.6 0.6 1. 1. 1.
0.72727273 1. 1. 0.72727273]
mean value: 0.8154545454545454
key: train_recall
value: [0.84210526 0.87368421 0.77894737 1. 0.97894737 1.
0.96808511 0.95744681 0.95744681 0.9787234 ]
mean value: 0.933538633818589
key: test_roc_auc
value: [0.65909091 0.75454545 0.75454545 0.63636364 0.90909091 0.7
0.71363636 0.8 1. 0.76363636]
mean value: 0.769090909090909
key: train_roc_auc
value: [0.89977604 0.92088466 0.88947368 0.68085106 0.89904815 0.77368421
0.92614782 0.9418813 0.9418813 0.94725644]
mean value: 0.8820884658454647
key: test_jcc
value: [0.41666667 0.54545455 0.54545455 0.55555556 0.83333333 0.64705882
0.57142857 0.73333333 1. 0.61538462]
mean value: 0.6463669990140578
key: train_jcc
value: [0.80808081 0.84693878 0.77894737 0.61290323 0.83035714 0.68613139
0.86666667 0.89108911 0.89108911 0.90196078]
mean value: 0.8114164376339148
MCC on Blind test: 0.58
Accuracy on Blind test: 0.8
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.14535904 0.14270997 0.1418457 0.14258361 0.14160824 0.14119768
0.14122701 0.15788913 0.1422658 0.14225483]
mean value: 0.14389410018920898
key: score_time
value: [0.01751041 0.0176034 0.01684165 0.01729679 0.01721644 0.01762438
0.01736569 0.01751566 0.01748872 0.01746202]
mean value: 0.017392516136169434
key: test_mcc
value: [0.71562645 0.90829511 0.26967994 0.71562645 0.90909091 0.80909091
0.71818182 0.71818182 1. 0.71562645]
mean value: 0.7479399847756402
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.85714286 0.95238095 0.61904762 0.85714286 0.95238095 0.9047619
0.85714286 0.85714286 1. 0.85714286]
mean value: 0.8714285714285714
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.84210526 0.94736842 0.66666667 0.84210526 0.95238095 0.90909091
0.85714286 0.85714286 1. 0.86956522]
mean value: 0.8743568407183968
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 1. 0.57142857 0.88888889 0.90909091 0.90909091
0.9 0.9 1. 0.83333333]
mean value: 0.88007215007215
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.9 0.8 0.8 1. 0.90909091
0.81818182 0.81818182 1. 0.90909091]
mean value: 0.8754545454545455
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.85454545 0.95 0.62727273 0.85454545 0.95454545 0.90454545
0.85909091 0.85909091 1. 0.85454545]
mean value: 0.8718181818181818
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.72727273 0.9 0.5 0.72727273 0.90909091 0.83333333
0.75 0.75 1. 0.76923077]
mean value: 0.7866200466200466
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03950834 0.0361855 0.0394659 0.03514695 0.03398728 0.0323205
0.03702044 0.03738832 0.03468561 0.03310966]
mean value: 0.035881853103637694
key: score_time
value: [0.01812267 0.01860189 0.01861978 0.08073974 0.01802468 0.01793027
0.02523518 0.01894164 0.01871896 0.01882577]
mean value: 0.025376057624816893
key: test_mcc
value: [0.62641448 0.82275335 0.4719399 0.90829511 1. 0.90829511
0.80909091 0.90829511 0.82572282 0.90829511]
mean value: 0.8189101896090047
key: train_mcc
value: [0.97883539 0.97883539 0.98947368 0.98947368 1. 0.98947251
0.98947368 0.97905237 0.98947368 0.98947251]
mean value: 0.987356290106085
key: test_accuracy
value: [0.80952381 0.9047619 0.71428571 0.95238095 1. 0.95238095
0.9047619 0.95238095 0.9047619 0.95238095]
mean value: 0.9047619047619048
key: train_accuracy
value: [0.98941799 0.98941799 0.99470899 0.99470899 1. 0.99470899
0.99470899 0.98941799 0.99470899 0.99470899]
mean value: 0.9936507936507937
key: test_fscore
value: [0.77777778 0.88888889 0.75 0.94736842 1. 0.95652174
0.90909091 0.95652174 0.9 0.95652174]
mean value: 0.9042691214201511
key: train_fscore
value: [0.98947368 0.98947368 0.99470899 0.99470899 1. 0.99465241
0.99470899 0.98924731 0.99470899 0.99465241]
mean value: 0.9936335471919213
key: test_precision
value: [0.875 1. 0.64285714 1. 1. 0.91666667
0.90909091 0.91666667 1. 0.91666667]
mean value: 0.9176948051948052
key: train_precision
value: [0.98947368 0.98947368 1. 1. 1. 1.
0.98947368 1. 0.98947368 1. ]
mean value: 0.9957894736842106
key: test_recall
value: [0.7 0.8 0.9 0.9 1. 1.
0.90909091 1. 0.81818182 1. ]
mean value: 0.9027272727272727
key: train_recall
value: [0.98947368 0.98947368 0.98947368 0.98947368 1. 0.9893617
1. 0.9787234 1. 0.9893617 ]
mean value: 0.9915341545352744
key: test_roc_auc
value: [0.80454545 0.9 0.72272727 0.95 1. 0.95
0.90454545 0.95 0.90909091 0.95 ]
mean value: 0.9040909090909091
key: train_roc_auc
value: [0.98941769 0.98941769 0.99473684 0.99473684 1. 0.99468085
0.99473684 0.9893617 0.99473684 0.99468085]
mean value: 0.9936506159014558
key: test_jcc
value: [0.63636364 0.8 0.6 0.9 1. 0.91666667
0.83333333 0.91666667 0.81818182 0.91666667]
mean value: 0.8337878787878787
key: train_jcc
value: [0.97916667 0.97916667 0.98947368 0.98947368 1. 0.9893617
0.98947368 0.9787234 0.98947368 0.9893617 ]
mean value: 0.9873674878686076
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.08917499 0.13380027 0.17171621 0.11417508 0.11032367 0.10746312
0.12534523 0.12104058 0.1084094 0.1078198 ]
mean value: 0.11892683506011963
key: score_time
value: [0.03536677 0.02449703 0.02127337 0.01806617 0.02497578 0.02218723
0.02117968 0.02874351 0.02377129 0.03750396]
mean value: 0.025756478309631348
key: test_mcc
value: [0.03739788 0.53935989 0.52295779 0.23636364 0.62641448 0.33028913
0.18090681 0.52727273 0.39196475 0.55161872]
mean value: 0.3944545813288632
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.52380952 0.76190476 0.76190476 0.61904762 0.80952381 0.66666667
0.57142857 0.76190476 0.66666667 0.76190476]
mean value: 0.6904761904761905
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.44444444 0.70588235 0.73684211 0.6 0.77777778 0.69565217
0.47058824 0.76190476 0.58823529 0.73684211]
mean value: 0.6518169250919285
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.5 0.85714286 0.77777778 0.6 0.875 0.66666667
0.66666667 0.8 0.83333333 0.875 ]
mean value: 0.7451587301587301
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.4 0.6 0.7 0.6 0.7 0.72727273
0.36363636 0.72727273 0.45454545 0.63636364]
mean value: 0.5909090909090909
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.51818182 0.75454545 0.75909091 0.61818182 0.80454545 0.66363636
0.58181818 0.76363636 0.67727273 0.76818182]
mean value: 0.6909090909090909
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.28571429 0.54545455 0.58333333 0.42857143 0.63636364 0.53333333
0.30769231 0.61538462 0.41666667 0.58333333]
mean value: 0.4935847485847486
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.46492267 0.45418954 0.45839477 0.45019031 0.45531964 0.45568514
0.44885039 0.45692539 0.45565152 0.44814992]
mean value: 0.45482792854309084
key: score_time
value: [0.01082897 0.01071215 0.01069975 0.0107615 0.01069736 0.01068664
0.01067924 0.01079154 0.01067519 0.01112723]
mean value: 0.010765957832336425
key: test_mcc
value: [0.90829511 1. 0.4719399 0.90909091 0.90909091 0.90829511
0.80909091 0.90829511 1. 0.90829511]
mean value: 0.8732393055913986
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95238095 1. 0.71428571 0.95238095 0.95238095 0.95238095
0.9047619 0.95238095 1. 0.95238095]
mean value: 0.9333333333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94736842 1. 0.75 0.95238095 0.95238095 0.95652174
0.90909091 0.95652174 1. 0.95652174]
mean value: 0.938078645229675
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.64285714 0.90909091 0.90909091 0.91666667
0.90909091 0.91666667 1. 0.91666667]
mean value: 0.9120129870129869
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9 1. 0.9 1. 1. 1.
0.90909091 1. 1. 1. ]
mean value: 0.9709090909090909
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95 1. 0.72272727 0.95454545 0.95454545 0.95
0.90454545 0.95 1. 0.95 ]
mean value: 0.9336363636363636
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9 1. 0.6 0.90909091 0.90909091 0.91666667
0.83333333 0.91666667 1. 0.91666667]
mean value: 0.8901515151515151
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.07232475 0.05349302 0.05705237 0.06523085 0.04446149 0.04642963
0.0278461 0.02402425 0.02402234 0.0231905 ]
mean value: 0.04380753040313721
key: score_time
value: [0.02672768 0.02717161 0.02339268 0.01948118 0.02111459 0.0229249
0.01305509 0.01254392 0.01267433 0.01560426]
mean value: 0.019469022750854492
key: test_mcc
value: [0.55161872 0.74795759 0.74795759 0.55161872 0.82572282 0.90829511
0.74161985 0.66332496 0.90829511 0.80909091]
mean value: 0.7455501384398332
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.76190476 0.85714286 0.85714286 0.76190476 0.9047619 0.95238095
0.85714286 0.80952381 0.95238095 0.9047619 ]
mean value: 0.8619047619047618
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7826087 0.86956522 0.86956522 0.7826087 0.90909091 0.95652174
0.88 0.84615385 0.95652174 0.90909091]
mean value: 0.876172696868349
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.69230769 0.76923077 0.76923077 0.69230769 0.83333333 0.91666667
0.78571429 0.73333333 0.91666667 0.90909091]
mean value: 0.8017882117882118
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9 1. 1. 0.9 1. 1.
1. 1. 1. 0.90909091]
mean value: 0.9709090909090909
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.76818182 0.86363636 0.86363636 0.76818182 0.90909091 0.95
0.85 0.8 0.95 0.90454545]
mean value: 0.8627272727272727
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.64285714 0.76923077 0.76923077 0.64285714 0.83333333 0.91666667
0.78571429 0.73333333 0.91666667 0.83333333]
mean value: 0.7843223443223444
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.6
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.047019 0.05411768 0.05120516 0.08179045 0.03637123 0.03757405
0.0525198 0.0249393 0.07093072 0.05749607]
mean value: 0.05139634609222412
key: score_time
value: [0.06786156 0.01952934 0.03211212 0.01617074 0.04544282 0.0289197
0.03419495 0.01245379 0.02793956 0.02250385]
mean value: 0.03071284294128418
key: test_mcc
value: [0.52295779 0.80909091 0.44038551 1. 0.80909091 0.82572282
0.55161872 0.82275335 1. 0.63305416]
mean value: 0.7414674176772243
key: train_mcc
value: [0.97883539 0.95789003 0.94713854 0.9264031 0.94714446 0.94757483
0.96830553 0.94757483 0.93650616 0.93736014]
mean value: 0.9494732992955751
key: test_accuracy
value: [0.76190476 0.9047619 0.71428571 1. 0.9047619 0.9047619
0.76190476 0.9047619 1. 0.80952381]
mean value: 0.8666666666666667
key: train_accuracy
value: [0.98941799 0.97883598 0.97354497 0.96296296 0.97354497 0.97354497
0.98412698 0.97354497 0.96825397 0.96825397]
mean value: 0.9746031746031746
key: test_fscore
value: [0.73684211 0.9 0.72727273 1. 0.9 0.9
0.73684211 0.91666667 1. 0.8 ]
mean value: 0.861762360446571
key: train_fscore
value: [0.98947368 0.9787234 0.97382199 0.96256684 0.97354497 0.97382199
0.98395722 0.97382199 0.96808511 0.96875 ]
mean value: 0.9746567201151308
key: test_precision
value: [0.77777778 0.9 0.66666667 1. 0.9 1.
0.875 0.84615385 1. 0.88888889]
mean value: 0.8854487179487179
key: train_precision
value: [0.98947368 0.98924731 0.96875 0.97826087 0.9787234 0.95876289
0.98924731 0.95876289 0.96808511 0.94897959]
mean value: 0.9728293053102567
key: test_recall
value: [0.7 0.9 0.8 1. 0.9 0.81818182
0.63636364 1. 1. 0.72727273]
mean value: 0.8481818181818181
key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_sl.py:128: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_sl.py:131: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.98947368 0.96842105 0.97894737 0.94736842 0.96842105 0.9893617
0.9787234 0.9893617 0.96808511 0.9893617 ]
mean value: 0.9767525195968645
key: test_roc_auc
value: [0.75909091 0.90454545 0.71818182 1. 0.90454545 0.90909091
0.76818182 0.9 1. 0.81363636]
mean value: 0.8677272727272727
key: train_roc_auc
value: [0.98941769 0.97889138 0.97351624 0.96304591 0.97357223 0.97362822
0.98409854 0.97362822 0.96825308 0.96836506]
mean value: 0.9746416573348264
key: test_jcc
value: [0.58333333 0.81818182 0.57142857 1. 0.81818182 0.81818182
0.58333333 0.84615385 1. 0.66666667]
mean value: 0.7705461205461206
key: train_jcc
value: [0.97916667 0.95833333 0.94897959 0.92783505 0.94845361 0.94897959
0.96842105 0.94897959 0.93814433 0.93939394]
mean value: 0.9506686757226445
MCC on Blind test: 0.58
Accuracy on Blind test: 0.8
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.24432898 0.38443637 0.46159816 0.37851357 0.30649304 0.50042582
0.33015323 0.36040902 0.35314488 0.34496379]
mean value: 0.36644668579101564
key: score_time
value: [0.02193141 0.03222609 0.03092313 0.02275586 0.02286959 0.02013755
0.02346706 0.01971507 0.01932216 0.02431846]
mean value: 0.023766636848449707
key: test_mcc
value: [0.62641448 0.71562645 0.52295779 1. 0.80909091 0.52727273
0.42727273 0.82275335 1. 0.63305416]
mean value: 0.7084442598864285
key: train_mcc
value: [0.98947251 0.94714446 0.95767077 0.93650616 0.93650616 0.95789003
0.95789003 0.94757483 0.93650616 0.93736014]
mean value: 0.9504521239520874
key: test_accuracy
value: [0.80952381 0.85714286 0.76190476 1. 0.9047619 0.76190476
0.71428571 0.9047619 1. 0.80952381]
mean value: 0.8523809523809524
key: train_accuracy
value: [0.99470899 0.97354497 0.97883598 0.96825397 0.96825397 0.97883598
0.97883598 0.97354497 0.96825397 0.96825397]
mean value: 0.9751322751322751
key: test_fscore
value: [0.77777778 0.84210526 0.73684211 1. 0.9 0.76190476
0.72727273 0.91666667 1. 0.8 ]
mean value: 0.8462569302042986
key: train_fscore
value: [0.9947644 0.97354497 0.97894737 0.96842105 0.96842105 0.97894737
0.97894737 0.97382199 0.96808511 0.96875 ]
mean value: 0.9752650677888823
key: test_precision
value: [0.875 0.88888889 0.77777778 1. 0.9 0.8
0.72727273 0.84615385 1. 0.88888889]
mean value: 0.8703982128982128
key: train_precision
value: [0.98958333 0.9787234 0.97894737 0.96842105 0.96842105 0.96875
0.96875 0.95876289 0.96808511 0.94897959]
mean value: 0.9697423796090515
key: test_recall
value: [0.7 0.8 0.7 1. 0.9 0.72727273
0.72727273 1. 1. 0.72727273]
mean value: 0.8281818181818181
key: train_recall
value: [1. 0.96842105 0.97894737 0.96842105 0.96842105 0.9893617
0.9893617 0.9893617 0.96808511 0.9893617 ]
mean value: 0.9809742441209407
key: test_roc_auc
value: [0.80454545 0.85454545 0.75909091 1. 0.90454545 0.76363636
0.71363636 0.9 1. 0.81363636]
mean value: 0.8513636363636363
key: train_roc_auc
value: [0.99468085 0.97357223 0.97883539 0.96825308 0.96825308 0.97889138
0.97889138 0.97362822 0.96825308 0.96836506]
mean value: 0.9751623740201567
key: test_jcc
value: [0.63636364 0.72727273 0.58333333 1. 0.81818182 0.61538462
0.57142857 0.84615385 1. 0.66666667]
mean value: 0.7464785214785215
key: train_jcc
value: [0.98958333 0.94845361 0.95876289 0.93877551 0.93877551 0.95876289
0.95876289 0.94897959 0.93814433 0.93939394]
mean value: 0.9518394482910315
MCC on Blind test: 0.58
Accuracy on Blind test: 0.8
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03008127 0.03638411 0.03371334 0.03461933 0.088166 0.07241488
0.03378487 0.06745028 0.06802797 0.06680107]
mean value: 0.053144311904907225
key: score_time
value: [0.01519585 0.01250982 0.01437712 0.01443815 0.02278161 0.01534963
0.01221824 0.02406025 0.01457262 0.01535487]
mean value: 0.0160858154296875
key: test_mcc
value: [0.33028913 0.62641448 0.44038551 0.61818182 0.82275335 0.80909091
0.55161872 0.80909091 0.42727273 0.71818182]
mean value: 0.6153279376024733
key: train_mcc
value: [0.8738236 0.8314659 0.83068309 0.862486 0.8738236 0.87319373
0.90480458 0.88402082 0.86284197 0.89438907]
mean value: 0.8691532351249575
key: test_accuracy
value: [0.66666667 0.80952381 0.71428571 0.80952381 0.9047619 0.9047619
0.76190476 0.9047619 0.71428571 0.85714286]
mean value: 0.8047619047619048
key: train_accuracy
value: [0.93650794 0.91534392 0.91534392 0.93121693 0.93650794 0.93650794
0.95238095 0.94179894 0.93121693 0.94708995]
mean value: 0.9343915343915343
key: test_fscore
value: [0.63157895 0.77777778 0.72727273 0.8 0.88888889 0.90909091
0.73684211 0.90909091 0.72727273 0.85714286]
mean value: 0.7964957849168376
key: train_fscore
value: [0.93548387 0.91397849 0.91578947 0.93121693 0.93548387 0.93548387
0.95187166 0.94054054 0.92972973 0.94736842]
mean value: 0.9336946861504937
key: test_precision
value: [0.66666667 0.875 0.66666667 0.8 1. 0.90909091
0.875 0.90909091 0.72727273 0.9 ]
mean value: 0.8328787878787879
key: train_precision
value: [0.95604396 0.93406593 0.91578947 0.93617021 0.95604396 0.94565217
0.95698925 0.95604396 0.94505495 0.9375 ]
mean value: 0.9439353854927787
key: test_recall
value: [0.6 0.7 0.8 0.8 0.8 0.90909091
0.63636364 0.90909091 0.72727273 0.81818182]
mean value: 0.77
key: train_recall
value: [0.91578947 0.89473684 0.91578947 0.92631579 0.91578947 0.92553191
0.94680851 0.92553191 0.91489362 0.95744681]
mean value: 0.9238633818589026
key: test_roc_auc
value: [0.66363636 0.80454545 0.71818182 0.80909091 0.9 0.90454545
0.76818182 0.90454545 0.71363636 0.85909091]
mean value: 0.8045454545454546
key: train_roc_auc
value: [0.93661814 0.91545353 0.91534155 0.931243 0.93661814 0.93645017
0.95235162 0.94171333 0.93113102 0.94714446]
mean value: 0.9344064949608063
key: test_jcc
value: [0.46153846 0.63636364 0.57142857 0.66666667 0.8 0.83333333
0.58333333 0.83333333 0.57142857 0.75 ]
mean value: 0.6707425907425908
key: train_jcc
value: [0.87878788 0.84158416 0.84466019 0.87128713 0.87878788 0.87878788
0.90816327 0.8877551 0.86868687 0.9 ]
mean value: 0.8758500353700914
MCC on Blind test: 0.61
Accuracy on Blind test: 0.8
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.30685544 1.22656345 2.2850945 1.44909978 0.99897575 0.93024874
0.82502961 1.04818726 1.14439416 0.94249988]
mean value: 1.2156948566436767
key: score_time
value: [0.05694318 0.05938172 0.03290272 0.0146718 0.01458669 0.01460767
0.01712728 0.01470828 0.01468945 0.01212597]
mean value: 0.025174474716186522
key: test_mcc
value: [0.61818182 0.71562645 0.52295779 1. 0.90829511 0.74795759
0.55161872 1. 0.71818182 0.67419986]
mean value: 0.7457019156935036
key: train_mcc
value: [1. 0.98947368 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9989473684210526
key: test_accuracy
value: [0.80952381 0.85714286 0.76190476 1. 0.95238095 0.85714286
0.76190476 1. 0.85714286 0.80952381]
mean value: 0.8666666666666667
key: train_accuracy
value: [1. 0.99470899 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9994708994708995
key: test_fscore
value: [0.8 0.84210526 0.73684211 1. 0.94736842 0.84210526
0.73684211 1. 0.85714286 0.77777778]
mean value: 0.8540183792815372
key: train_fscore
value: [1. 0.99470899 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9994708994708995
key: test_precision
value: [0.8 0.88888889 0.77777778 1. 1. 1.
0.875 1. 0.9 1. ]
mean value: 0.9241666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.8 0.7 1. 0.9 0.72727273
0.63636364 1. 0.81818182 0.63636364]
mean value: 0.8018181818181819
key: train_recall
value: [1. 0.98947368 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9989473684210526
key: test_roc_auc
value: [0.80909091 0.85454545 0.75909091 1. 0.95 0.86363636
0.76818182 1. 0.85909091 0.81818182]
mean value: 0.8681818181818182
key: train_roc_auc
value: [1. 0.99473684 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9994736842105263
key: test_jcc
value: [0.66666667 0.72727273 0.58333333 1. 0.9 0.72727273
0.58333333 1. 0.75 0.63636364]
mean value: 0.7574242424242424
key: train_jcc
value: [1. 0.98947368 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9989473684210526
MCC on Blind test: 0.6
Accuracy on Blind test: 0.8
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01341915 0.00960445 0.009269 0.00893736 0.00891161 0.0090003
0.00897193 0.00950503 0.00953388 0.01000404]
mean value: 0.009715676307678223
key: score_time
value: [0.01628661 0.01169658 0.00924897 0.00879931 0.00866485 0.00861454
0.00882459 0.00885248 0.00950265 0.0094347 ]
mean value: 0.009992527961730956
key: test_mcc
value: [ 0.35527986 -0.23373675 0.11677484 0.23373675 0.39196475 0.50874702
0.42727273 0.33709993 0.15569979 0.80909091]
mean value: 0.3101929821275265
key: train_mcc
value: [0.4223863 0.42057994 0.42563559 0.42871542 0.43824416 0.39871188
0.49053012 0.44175632 0.4436004 0.39053852]
mean value: 0.4300698660601598
key: test_accuracy
value: [0.66666667 0.38095238 0.52380952 0.61904762 0.66666667 0.71428571
0.71428571 0.66666667 0.57142857 0.9047619 ]
mean value: 0.6428571428571428
key: train_accuracy
value: [0.6984127 0.69312169 0.7037037 0.71428571 0.70899471 0.68783069
0.73544974 0.7037037 0.69312169 0.67724868]
mean value: 0.7015873015873015
key: test_fscore
value: [0.69565217 0.43478261 0.64285714 0.55555556 0.72 0.78571429
0.72727273 0.72 0.68965517 0.90909091]
mean value: 0.688058057551311
key: train_fscore
value: [0.74439462 0.74561404 0.74311927 0.71276596 0.74885845 0.73059361
0.76635514 0.75 0.75213675 0.73127753]
mean value: 0.7425115357581491
key: test_precision
value: [0.61538462 0.38461538 0.5 0.625 0.6 0.64705882
0.72727273 0.64285714 0.55555556 0.90909091]
mean value: 0.6206835158305747
key: train_precision
value: [0.6484375 0.63909774 0.65853659 0.72043011 0.66129032 0.64
0.68333333 0.64615385 0.62857143 0.62406015]
mean value: 0.654991101826883
key: test_recall
value: [0.8 0.5 0.9 0.5 0.9 1.
0.72727273 0.81818182 0.90909091 0.90909091]
mean value: 0.7963636363636364
key: train_recall
value: [0.87368421 0.89473684 0.85263158 0.70526316 0.86315789 0.85106383
0.87234043 0.89361702 0.93617021 0.88297872]
mean value: 0.8625643896976484
key: test_roc_auc
value: [0.67272727 0.38636364 0.54090909 0.61363636 0.67727273 0.7
0.71363636 0.65909091 0.55454545 0.90454545]
mean value: 0.6422727272727272
key: train_roc_auc
value: [0.6974804 0.69204927 0.70291153 0.71433371 0.70817469 0.68868981
0.73617021 0.70470325 0.6944009 0.67833147]
mean value: 0.7017245240761478
key: test_jcc
value: [0.53333333 0.27777778 0.47368421 0.38461538 0.5625 0.64705882
0.57142857 0.5625 0.52631579 0.83333333]
mean value: 0.5372547224017812
key: train_jcc
value: [0.59285714 0.59440559 0.59124088 0.55371901 0.59854015 0.57553957
0.62121212 0.6 0.60273973 0.57638889]
mean value: 0.5906643071898742
MCC on Blind test: 0.12
Accuracy on Blind test: 0.6
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00971127 0.00953364 0.00909424 0.00915623 0.00902915 0.00920463
0.00920725 0.00904489 0.00909853 0.00923586]
mean value: 0.0092315673828125
key: score_time
value: [0.00908184 0.00872087 0.00866318 0.00864768 0.0086987 0.00893164
0.00869703 0.00877166 0.00874424 0.00874138]
mean value: 0.008769822120666505
key: test_mcc
value: [ 0.24771685 -0.08528029 0.26967994 0.43007562 0.45226702 0.24771685
0.15894099 0.33028913 0.13762047 0.30914104]
mean value: 0.24981676122162266
key: train_mcc
value: [0.47383838 0.43945337 0.49511046 0.41111248 0.4606251 0.46213311
0.45044462 0.43913092 0.44972004 0.47093091]
mean value: 0.45524993919694523
key: test_accuracy
value: [0.61904762 0.47619048 0.61904762 0.71428571 0.71428571 0.61904762
0.57142857 0.66666667 0.57142857 0.61904762]
mean value: 0.6190476190476191
key: train_accuracy
value: [0.73544974 0.71957672 0.74603175 0.7037037 0.73015873 0.73015873
0.72486772 0.71957672 0.72486772 0.73544974]
mean value: 0.726984126984127
key: test_fscore
value: [0.63636364 0.26666667 0.66666667 0.66666667 0.625 0.6
0.52631579 0.69565217 0.60869565 0.5 ]
mean value: 0.5792027251924277
key: train_fscore
value: [0.72222222 0.71657754 0.73333333 0.68539326 0.72727273 0.7150838
0.71428571 0.71657754 0.72340426 0.7311828 ]
mean value: 0.7185333185655622
key: test_precision
value: [0.58333333 0.4 0.57142857 0.75 0.83333333 0.66666667
0.625 0.66666667 0.58333333 0.8 ]
mean value: 0.6479761904761905
key: train_precision
value: [0.76470588 0.72826087 0.77647059 0.73493976 0.73913043 0.75294118
0.73863636 0.72043011 0.72340426 0.73913043]
mean value: 0.7418049871707797
key: test_recall
value: [0.7 0.2 0.8 0.6 0.5 0.54545455
0.45454545 0.72727273 0.63636364 0.36363636]
mean value: 0.5527272727272727
key: train_recall
value: [0.68421053 0.70526316 0.69473684 0.64210526 0.71578947 0.68085106
0.69148936 0.71276596 0.72340426 0.72340426]
mean value: 0.6974020156774916
key: test_roc_auc
value: [0.62272727 0.46363636 0.62727273 0.70909091 0.70454545 0.62272727
0.57727273 0.66363636 0.56818182 0.63181818]
mean value: 0.6190909090909091
key: train_roc_auc
value: [0.73572228 0.71965286 0.74630459 0.70403135 0.73023516 0.72989922
0.72469205 0.71954087 0.72486002 0.73538634]
mean value: 0.7270324748040313
key: test_jcc
value: [0.46666667 0.15384615 0.5 0.5 0.45454545 0.42857143
0.35714286 0.53333333 0.4375 0.33333333]
mean value: 0.4164939227439227
key: train_jcc
value: [0.56521739 0.55833333 0.57894737 0.52136752 0.57142857 0.55652174
0.55555556 0.55833333 0.56666667 0.57627119]
mean value: 0.5608642666981495
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00931072 0.00956655 0.00860357 0.00860739 0.00866055 0.00952816
0.00955248 0.00887895 0.00881219 0.00958371]
mean value: 0.009110426902770996
key: score_time
value: [0.01484823 0.0155077 0.01463413 0.014539 0.01421666 0.01497364
0.01540303 0.01495123 0.01449418 0.01488686]
mean value: 0.014845466613769532
key: test_mcc
value: [-0.23636364 0.24120908 -0.14545455 0.42727273 -0.15894099 0.04545455
0.08528029 0.13762047 0.15894099 0.08528029]
mean value: 0.06402992102965852
key: train_mcc
value: [0.50296855 0.39710991 0.42871542 0.39710991 0.3454297 0.51484568
0.49237699 0.43913092 0.44975918 0.39915366]
mean value: 0.43665999514803383
key: test_accuracy
value: [0.38095238 0.61904762 0.42857143 0.71428571 0.42857143 0.52380952
0.52380952 0.57142857 0.57142857 0.52380952]
mean value: 0.5285714285714286
key: train_accuracy
value: [0.75132275 0.6984127 0.71428571 0.6984127 0.67195767 0.75661376
0.74603175 0.71957672 0.72486772 0.6984127 ]
mean value: 0.717989417989418
key: test_fscore
value: [0.38095238 0.5 0.4 0.7 0.33333333 0.54545455
0.375 0.60869565 0.52631579 0.375 ]
mean value: 0.4744751701387857
key: train_fscore
value: [0.7486631 0.69518717 0.71276596 0.69518717 0.65934066 0.74444444
0.73913043 0.71657754 0.72043011 0.6779661 ]
mean value: 0.710969267849835
key: test_precision
value: [0.36363636 0.66666667 0.4 0.7 0.375 0.54545455
0.6 0.58333333 0.625 0.6 ]
mean value: 0.5459090909090909
key: train_precision
value: [0.76086957 0.70652174 0.72043011 0.70652174 0.68965517 0.77906977
0.75555556 0.72043011 0.72826087 0.72289157]
mean value: 0.7290206189773512
key: test_recall
value: [0.4 0.4 0.4 0.7 0.3 0.54545455
0.27272727 0.63636364 0.45454545 0.27272727]
mean value: 0.4381818181818182
key: train_recall
value: [0.73684211 0.68421053 0.70526316 0.68421053 0.63157895 0.71276596
0.72340426 0.71276596 0.71276596 0.63829787]
mean value: 0.6942105263157895
key: test_roc_auc
value: [0.38181818 0.60909091 0.42727273 0.71363636 0.42272727 0.52272727
0.53636364 0.56818182 0.57727273 0.53636364]
mean value: 0.5295454545454545
key: train_roc_auc
value: [0.75139978 0.69848824 0.71433371 0.69848824 0.67217245 0.75638298
0.74591265 0.71954087 0.72480403 0.6980963 ]
mean value: 0.7179619260918253
key: test_jcc
value: [0.23529412 0.33333333 0.25 0.53846154 0.2 0.375
0.23076923 0.4375 0.35714286 0.23076923]
mean value: 0.31882703081232494
key: train_jcc
value: [0.5982906 0.53278689 0.55371901 0.53278689 0.49180328 0.59292035
0.5862069 0.55833333 0.56302521 0.51282051]
mean value: 0.5522692962507294
MCC on Blind test: 0.17
Accuracy on Blind test: 0.6
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01352286 0.01665878 0.01201367 0.01270628 0.01311684 0.0131042
0.01292396 0.01178217 0.01296258 0.01138806]
mean value: 0.013017940521240234
key: score_time
value: [0.011729 0.00963879 0.01002765 0.01018095 0.01015759 0.01019979
0.01010227 0.00988412 0.01014733 0.00937819]
mean value: 0.010144567489624024
key: test_mcc
value: [0.26967994 0.03739788 0.30914104 0.61818182 0.33636364 0.42727273
0.24771685 0.52295779 0.33709993 0.82572282]
mean value: 0.3931534435784296
key: train_mcc
value: [0.73576888 0.75666293 0.74744848 0.75694773 0.73585755 0.75666293
0.80967855 0.82013664 0.80967855 0.8102023 ]
mean value: 0.7739044547190053
key: test_accuracy
value: [0.61904762 0.52380952 0.61904762 0.80952381 0.66666667 0.71428571
0.61904762 0.76190476 0.66666667 0.9047619 ]
mean value: 0.6904761904761905
key: train_accuracy
value: [0.86772487 0.87830688 0.87301587 0.87830688 0.86772487 0.87830688
0.9047619 0.91005291 0.9047619 0.9047619 ]
mean value: 0.8867724867724868
key: test_fscore
value: [0.66666667 0.44444444 0.69230769 0.8 0.66666667 0.72727273
0.6 0.7826087 0.72 0.9 ]
mean value: 0.6999966893010371
key: train_fscore
value: [0.87046632 0.87830688 0.87755102 0.88082902 0.86631016 0.87830688
0.90322581 0.90909091 0.90322581 0.90217391]
mean value: 0.8869486709274905
key: test_precision
value: [0.57142857 0.5 0.5625 0.8 0.63636364 0.72727273
0.66666667 0.75 0.64285714 1. ]
mean value: 0.6857088744588744
key: train_precision
value: [0.85714286 0.88297872 0.85148515 0.86734694 0.88043478 0.87368421
0.91304348 0.91397849 0.91304348 0.92222222]
mean value: 0.8875360334340103
key: test_recall
value: [0.8 0.4 0.9 0.8 0.7 0.72727273
0.54545455 0.81818182 0.81818182 0.81818182]
mean value: 0.7327272727272728
key: train_recall
value: [0.88421053 0.87368421 0.90526316 0.89473684 0.85263158 0.88297872
0.89361702 0.90425532 0.89361702 0.88297872]
mean value: 0.8867973124300113
key: test_roc_auc
value: [0.62727273 0.51818182 0.63181818 0.80909091 0.66818182 0.71363636
0.62272727 0.75909091 0.65909091 0.90909091]
mean value: 0.6918181818181819
key: train_roc_auc
value: [0.86763718 0.87833147 0.87284434 0.87821948 0.86780515 0.87833147
0.90470325 0.9100224 0.90470325 0.90464726]
mean value: 0.8867245240761478
key: test_jcc
value: [0.5 0.28571429 0.52941176 0.66666667 0.5 0.57142857
0.42857143 0.64285714 0.5625 0.81818182]
mean value: 0.5505331678125795
key: train_jcc
value: [0.7706422 0.78301887 0.78181818 0.78703704 0.76415094 0.78301887
0.82352941 0.83333333 0.82352941 0.82178218]
mean value: 0.7971860435015932
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.80780959 1.64683247 2.04292989 2.00777054 2.20077896 2.11120558
1.75354791 1.55821252 1.74798012 1.62602258]
mean value: 1.7503090143203734
key: score_time
value: [0.01250052 0.03408003 0.01242089 0.0147202 0.02710557 0.03695774
0.02797961 0.02157736 0.03396821 0.04308009]
mean value: 0.0264390230178833
key: test_mcc
value: [0.23636364 0.62641448 0.63305416 0.71562645 0.71562645 0.60302269
0.4719399 0.90909091 0.52727273 0.82572282]
mean value: 0.6264134232369432
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61904762 0.80952381 0.80952381 0.85714286 0.85714286 0.76190476
0.71428571 0.95238095 0.76190476 0.9047619 ]
mean value: 0.8047619047619048
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.6 0.77777778 0.81818182 0.84210526 0.84210526 0.70588235
0.66666667 0.95238095 0.76190476 0.9 ]
mean value: 0.7867004856168943
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6 0.875 0.75 0.88888889 0.88888889 1.
0.85714286 1. 0.8 1. ]
mean value: 0.8659920634920635
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6 0.7 0.9 0.8 0.8 0.54545455
0.54545455 0.90909091 0.72727273 0.81818182]
mean value: 0.7345454545454545
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.61818182 0.80454545 0.81363636 0.85454545 0.85454545 0.77272727
0.72272727 0.95454545 0.76363636 0.90909091]
mean value: 0.8068181818181819
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.42857143 0.63636364 0.69230769 0.72727273 0.72727273 0.54545455
0.5 0.90909091 0.61538462 0.81818182]
mean value: 0.65999000999001
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.6
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.03607321 0.01854372 0.01873636 0.01854372 0.01878786 0.01843524
0.0188539 0.0193491 0.01886439 0.01850581]
mean value: 0.02046933174133301
key: score_time
value: [0.0126884 0.0127058 0.01279783 0.0127821 0.01274991 0.01255703
0.01291847 0.01300454 0.01293087 0.01302671]
mean value: 0.012816166877746582
key: test_mcc
value: [0.82275335 0.82275335 0.71818182 1. 1. 0.90909091
0.52727273 1. 0.82572282 0.90909091]
mean value: 0.8534865889896018
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9047619 0.9047619 0.85714286 1. 1. 0.95238095
0.76190476 1. 0.9047619 0.95238095]
mean value: 0.9238095238095237
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.88888889 0.85714286 1. 1. 0.95238095
0.76190476 1. 0.9 0.95238095]
mean value: 0.9201587301587302
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.81818182 1. 1. 1.
0.8 1. 1. 1. ]
mean value: 0.9618181818181818
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.8 0.9 1. 1. 0.90909091
0.72727273 1. 0.81818182 0.90909091]
mean value: 0.8863636363636364
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9 0.9 0.85909091 1. 1. 0.95454545
0.76363636 1. 0.90909091 0.95454545]
mean value: 0.9240909090909091
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.8 0.75 1. 1. 0.90909091
0.61538462 1. 0.81818182 0.90909091]
mean value: 0.8601748251748251
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.49
Accuracy on Blind test: 0.73
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.14116931 0.14187193 0.14279318 0.14259577 0.14345264 0.14271784
0.14100552 0.1444006 0.28073263 0.1378026 ]
mean value: 0.1558542013168335
key: score_time
value: [0.02478743 0.02508497 0.02502012 0.02513218 0.02525902 0.02499199
0.02501416 0.02670145 0.0441525 0.02399564]
mean value: 0.027013945579528808
key: test_mcc
value: [0.33636364 0.33028913 0.63305416 0.71562645 1. 0.90909091
0.55161872 0.90909091 0.90829511 0.90909091]
mean value: 0.7202519935788301
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.66666667 0.66666667 0.80952381 0.85714286 1. 0.95238095
0.76190476 0.95238095 0.95238095 0.95238095]
mean value: 0.8571428571428571
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.63157895 0.81818182 0.84210526 1. 0.95238095
0.73684211 0.95238095 0.95652174 0.95238095]
mean value: 0.850903939691125
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.63636364 0.66666667 0.75 0.88888889 1. 1.
0.875 1. 0.91666667 1. ]
mean value: 0.8733585858585858
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.7 0.6 0.9 0.8 1. 0.90909091
0.63636364 0.90909091 1. 0.90909091]
mean value: 0.8363636363636363
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.66818182 0.66363636 0.81363636 0.85454545 1. 0.95454545
0.76818182 0.95454545 0.95 0.95454545]
mean value: 0.8581818181818182
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.46153846 0.69230769 0.72727273 1. 0.90909091
0.58333333 0.90909091 0.91666667 0.90909091]
mean value: 0.7608391608391608
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.29
Accuracy on Blind test: 0.67
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01380777 0.01287603 0.01335287 0.01323295 0.01333666 0.0131979
0.01330137 0.01331997 0.01724935 0.01322055]
mean value: 0.013689541816711425
key: score_time
value: [0.01226544 0.01245737 0.01221156 0.01228142 0.01906157 0.01224637
0.01229739 0.01220536 0.02101064 0.01225805]
mean value: 0.013829517364501952
key: test_mcc
value: [0.23636364 0.33636364 0.33028913 0.62641448 0.62641448 0.71818182
0.35527986 0.82572282 0.63305416 0.52727273]
mean value: 0.52153567593267
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61904762 0.66666667 0.66666667 0.80952381 0.80952381 0.85714286
0.66666667 0.9047619 0.80952381 0.76190476]
mean value: 0.7571428571428571
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.6 0.66666667 0.63157895 0.77777778 0.77777778 0.85714286
0.63157895 0.9 0.8 0.76190476]
mean value: 0.7404427736006683
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6 0.63636364 0.66666667 0.875 0.875 0.9
0.75 1. 0.88888889 0.8 ]
mean value: 0.7991919191919192
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6 0.7 0.6 0.7 0.7 0.81818182
0.54545455 0.81818182 0.72727273 0.72727273]
mean value: 0.6936363636363636
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.61818182 0.66818182 0.66363636 0.80454545 0.80454545 0.85909091
0.67272727 0.90909091 0.81363636 0.76363636]
mean value: 0.7577272727272728
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.42857143 0.5 0.46153846 0.63636364 0.63636364 0.75
0.46153846 0.81818182 0.66666667 0.61538462]
mean value: 0.5974608724608724
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.89492655 1.78066468 1.30577731 1.62279463 1.80648351 2.03026009
1.99937129 1.9573288 1.32665277 1.25374627]
mean value: 1.697800588607788
key: score_time
value: [0.13693285 0.12314796 0.09956264 0.12987232 0.12343669 0.12388182
0.15492225 0.12438631 0.09179854 0.09004569]
mean value: 0.11979870796203614
key: test_mcc
value: [0.23636364 0.58630197 0.63305416 0.80909091 1. 1.
0.55161872 0.90909091 0.90909091 0.90909091]
mean value: 0.7543702131757848
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61904762 0.76190476 0.80952381 0.9047619 1. 1.
0.76190476 0.95238095 0.95238095 0.95238095]
mean value: 0.8714285714285714
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.6 0.66666667 0.81818182 0.9 1. 1.
0.73684211 0.95238095 0.95238095 0.95238095]
mean value: 0.8578833447254499
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6 1. 0.75 0.9 1. 1. 0.875 1. 1. 1. ]
mean value: 0.9125
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6 0.5 0.9 0.9 1. 1.
0.63636364 0.90909091 0.90909091 0.90909091]
mean value: 0.8263636363636364
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.61818182 0.75 0.81363636 0.90454545 1. 1.
0.76818182 0.95454545 0.95454545 0.95454545]
mean value: 0.8718181818181818
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.42857143 0.5 0.69230769 0.81818182 1. 1.
0.58333333 0.90909091 0.90909091 0.90909091]
mean value: 0.7749666999667
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.88374734 0.898525 0.88748789 0.90992236 0.87181115 0.90699387
0.91898417 0.8755424 0.88821411 0.94180608]
mean value: 0.8983034372329712
key: score_time
value: [0.13070011 0.15602064 0.2069819 0.17687368 0.21706963 0.24573994
0.12457681 0.16810298 0.19927859 0.16002679]
mean value: 0.17853710651397706
key: test_mcc
value: [0.52295779 0.24120908 0.4719399 0.62641448 0.90829511 0.90829511
0.63305416 0.90909091 0.71562645 0.82572282]
mean value: 0.6762605808801134
key: train_mcc
value: [0.95767077 0.95767077 0.95788064 0.95767077 0.96830553 0.95789003
0.98947368 0.96830907 0.96830907 0.96830907]
mean value: 0.9651489409652466
key: test_accuracy
value: [0.76190476 0.61904762 0.71428571 0.80952381 0.95238095 0.95238095
0.80952381 0.95238095 0.85714286 0.9047619 ]
mean value: 0.8333333333333333
key: train_accuracy
value: [0.97883598 0.97883598 0.97883598 0.97883598 0.98412698 0.97883598
0.99470899 0.98412698 0.98412698 0.98412698]
mean value: 0.9825396825396825
key: test_fscore
value: [0.73684211 0.5 0.75 0.77777778 0.94736842 0.95652174
0.8 0.95238095 0.86956522 0.9 ]
mean value: 0.8190456212996259
key: train_fscore
value: [0.97894737 0.97894737 0.97916667 0.97894737 0.98429319 0.97894737
0.99470899 0.98412698 0.98412698 0.98412698]
mean value: 0.9826339281158102
key: test_precision
value: [0.77777778 0.66666667 0.64285714 0.875 1. 0.91666667
0.88888889 1. 0.83333333 1. ]
mean value: 0.8601190476190477
key: train_precision
value: [0.97894737 0.97894737 0.96907216 0.97894737 0.97916667 0.96875
0.98947368 0.97894737 0.97894737 0.97894737]
mean value: 0.9780146726351963
key: test_recall
value: [0.7 0.4 0.9 0.7 0.9 1.
0.72727273 0.90909091 0.90909091 0.81818182]
mean value: 0.7963636363636364
key: train_recall
value: [0.97894737 0.97894737 0.98947368 0.97894737 0.98947368 0.9893617
1. 0.9893617 0.9893617 0.9893617 ]
mean value: 0.9873236282194849
key: test_roc_auc
value: [0.75909091 0.60909091 0.72272727 0.80454545 0.95 0.95
0.81363636 0.95454545 0.85454545 0.90909091]
mean value: 0.8327272727272728
key: train_roc_auc
value: [0.97883539 0.97883539 0.9787794 0.97883539 0.98409854 0.97889138
0.99473684 0.98415454 0.98415454 0.98415454]
mean value: 0.9825475923852184
key: test_jcc
value: [0.58333333 0.33333333 0.6 0.63636364 0.9 0.91666667
0.66666667 0.90909091 0.76923077 0.81818182]
mean value: 0.7132867132867133
key: train_jcc
value: [0.95876289 0.95876289 0.95918367 0.95876289 0.96907216 0.95876289
0.98947368 0.96875 0.96875 0.96875 ]
mean value: 0.9659031069020121
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02410841 0.00991416 0.01022744 0.01032209 0.00959754 0.01036572
0.00976348 0.00933981 0.01035428 0.01001096]
mean value: 0.011400389671325683
key: score_time
value: [0.00971651 0.00905752 0.00934982 0.00952864 0.00984573 0.00897074
0.00895166 0.00975013 0.00975347 0.00914025]
mean value: 0.009406447410583496
key: test_mcc
value: [ 0.24771685 -0.08528029 0.26967994 0.43007562 0.45226702 0.24771685
0.15894099 0.33028913 0.13762047 0.30914104]
mean value: 0.24981676122162266
key: train_mcc
value: [0.47383838 0.43945337 0.49511046 0.41111248 0.4606251 0.46213311
0.45044462 0.43913092 0.44972004 0.47093091]
mean value: 0.45524993919694523
key: test_accuracy
value: [0.61904762 0.47619048 0.61904762 0.71428571 0.71428571 0.61904762
0.57142857 0.66666667 0.57142857 0.61904762]
mean value: 0.6190476190476191
key: train_accuracy
value: [0.73544974 0.71957672 0.74603175 0.7037037 0.73015873 0.73015873
0.72486772 0.71957672 0.72486772 0.73544974]
mean value: 0.726984126984127
key: test_fscore
value: [0.63636364 0.26666667 0.66666667 0.66666667 0.625 0.6
0.52631579 0.69565217 0.60869565 0.5 ]
mean value: 0.5792027251924277
key: train_fscore
value: [0.72222222 0.71657754 0.73333333 0.68539326 0.72727273 0.7150838
0.71428571 0.71657754 0.72340426 0.7311828 ]
mean value: 0.7185333185655622
key: test_precision
value: [0.58333333 0.4 0.57142857 0.75 0.83333333 0.66666667
0.625 0.66666667 0.58333333 0.8 ]
mean value: 0.6479761904761905
key: train_precision
value: [0.76470588 0.72826087 0.77647059 0.73493976 0.73913043 0.75294118
0.73863636 0.72043011 0.72340426 0.73913043]
mean value: 0.7418049871707797
key: test_recall
value: [0.7 0.2 0.8 0.6 0.5 0.54545455
0.45454545 0.72727273 0.63636364 0.36363636]
mean value: 0.5527272727272727
key: train_recall
value: [0.68421053 0.70526316 0.69473684 0.64210526 0.71578947 0.68085106
0.69148936 0.71276596 0.72340426 0.72340426]
mean value: 0.6974020156774916
key: test_roc_auc
value: [0.62272727 0.46363636 0.62727273 0.70909091 0.70454545 0.62272727
0.57727273 0.66363636 0.56818182 0.63181818]
mean value: 0.6190909090909091
key: train_roc_auc
value: [0.73572228 0.71965286 0.74630459 0.70403135 0.73023516 0.72989922
0.72469205 0.71954087 0.72486002 0.73538634]
mean value: 0.7270324748040313
key: test_jcc
value: [0.46666667 0.15384615 0.5 0.5 0.45454545 0.42857143
0.35714286 0.53333333 0.4375 0.33333333]
mean value: 0.4164939227439227
key: train_jcc
value: [0.56521739 0.55833333 0.57894737 0.52136752 0.57142857 0.55652174
0.55555556 0.55833333 0.56666667 0.57627119]
mean value: 0.5608642666981495
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [1.47359419 1.51395917 1.40394354 1.49806833 0.73248506 0.14855814
0.12012458 1.29104447 0.43357348 0.96943855]
mean value: 0.9584789514541626
key: score_time
value: [0.01303411 0.01369548 0.01240373 0.02015805 0.01262212 0.01179838
0.01309061 0.01261091 0.01320601 0.01363063]
mean value: 0.013625001907348633
key: test_mcc
value: [0.82275335 0.90829511 0.63305416 1. 1. 1.
0.71562645 1. 1. 0.82572282]
mean value: 0.8905451893561251
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9047619 0.95238095 0.80952381 1. 1. 1.
0.85714286 1. 1. 0.9047619 ]
mean value: 0.9428571428571428
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.94736842 0.81818182 1. 1. 1.
0.86956522 1. 1. 0.9 ]
mean value: 0.9424004345514643
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.75 1. 1. 1.
0.83333333 1. 1. 1. ]
mean value: 0.9583333333333334
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.9 0.9 1. 1. 1.
0.90909091 1. 1. 0.81818182]
mean value: 0.9327272727272727
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9 0.95 0.81363636 1. 1. 1.
0.85454545 1. 1. 0.90909091]
mean value: 0.9427272727272727
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.9 0.69230769 1. 1. 1.
0.76923077 1. 1. 0.81818182]
mean value: 0.897972027972028
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04121399 0.07725215 0.07548904 0.03828049 0.06962919 0.06137753
0.04697084 0.05984068 0.04247427 0.06800699]
mean value: 0.05805351734161377
key: score_time
value: [0.01933718 0.0345602 0.01221371 0.01211905 0.0219357 0.01210523
0.02289724 0.01555729 0.02384901 0.02143526]
mean value: 0.019600987434387207
key: test_mcc
value: [0.53935989 0.44038551 0.53935989 0.80909091 0.74161985 0.82572282
0.67419986 0.90909091 0.82572282 0.53300179]
mean value: 0.6837554253924925
key: train_mcc
value: [0.97883539 0.94757483 0.98947251 0.97905701 0.97905701 0.96830553
0.95788064 0.95788064 0.92637852 0.95788064]
mean value: 0.9642322713040267
key: test_accuracy
value: [0.76190476 0.71428571 0.76190476 0.9047619 0.85714286 0.9047619
0.80952381 0.95238095 0.9047619 0.71428571]
mean value: 0.8285714285714285
key: train_accuracy
value: [0.98941799 0.97354497 0.99470899 0.98941799 0.98941799 0.98412698
0.97883598 0.97883598 0.96296296 0.97883598]
mean value: 0.982010582010582
key: test_fscore
value: [0.70588235 0.72727273 0.70588235 0.9 0.82352941 0.9
0.77777778 0.95238095 0.9 0.625 ]
mean value: 0.8017725575078516
key: train_fscore
value: [0.98947368 0.97326203 0.9947644 0.9893617 0.9893617 0.98395722
0.97849462 0.97849462 0.96216216 0.97849462]
mean value: 0.9817826770838407
key: test_precision
value: [0.85714286 0.66666667 0.85714286 0.9 1. 1.
1. 1. 1. 1. ]
mean value: 0.9280952380952381
key: train_precision
value: [0.98947368 0.98913043 0.98958333 1. 1. 0.98924731
0.98913043 0.98913043 0.97802198 0.98913043]
mean value: 0.990284804652423
key: test_recall
value: [0.6 0.8 0.6 0.9 0.7 0.81818182
0.63636364 0.90909091 0.81818182 0.45454545]
mean value: 0.7236363636363636
key: train_recall
value: [0.98947368 0.95789474 1. 0.97894737 0.97894737 0.9787234
0.96808511 0.96808511 0.94680851 0.96808511]
mean value: 0.973505039193729
key: test_roc_auc
value: [0.75454545 0.71818182 0.75454545 0.90454545 0.85 0.90909091
0.81818182 0.95454545 0.90909091 0.72727273]
mean value: 0.83
key: train_roc_auc
value: [0.98941769 0.97362822 0.99468085 0.98947368 0.98947368 0.98409854
0.9787794 0.9787794 0.96287794 0.9787794 ]
mean value: 0.9819988801791713
key: test_jcc
value: [0.54545455 0.57142857 0.54545455 0.81818182 0.7 0.81818182
0.63636364 0.90909091 0.81818182 0.45454545]
mean value: 0.6816883116883117
key: train_jcc
value: [0.97916667 0.94791667 0.98958333 0.97894737 0.97894737 0.96842105
0.95789474 0.95789474 0.92708333 0.95789474]
mean value: 0.964375
MCC on Blind test: 0.11
Accuracy on Blind test: 0.53
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02042866 0.01330423 0.01539516 0.01460958 0.01472521 0.0147078
0.01172614 0.01205611 0.00887656 0.00874543]
mean value: 0.013457489013671876
key: score_time
value: [0.02030158 0.01348376 0.01388836 0.01446819 0.01431298 0.01398635
0.01170921 0.00890064 0.00833917 0.00834632]
mean value: 0.012773656845092773
key: test_mcc
value: [ 0.33028913 -0.05504819 0.08528029 0.42727273 0.55161872 0.62641448
0.23636364 0.24120908 0.02312486 0.55161872]
mean value: 0.30181434631411985
key: train_mcc
value: [0.39005594 0.40281841 0.35974476 0.37994444 0.43065616 0.40240809
0.44248737 0.47825095 0.44248737 0.39243141]
mean value: 0.41212849003171664
key: test_accuracy
value: [0.66666667 0.47619048 0.52380952 0.71428571 0.76190476 0.80952381
0.61904762 0.61904762 0.52380952 0.76190476]
mean value: 0.6476190476190476
key: train_accuracy
value: [0.69312169 0.6984127 0.67724868 0.68783069 0.71428571 0.6984127
0.71957672 0.73544974 0.71957672 0.69312169]
mean value: 0.7037037037037037
key: test_fscore
value: [0.63157895 0.42105263 0.61538462 0.7 0.7826087 0.83333333
0.63636364 0.69230769 0.64285714 0.73684211]
mean value: 0.669232880010912
key: train_fscore
value: [0.71568627 0.72463768 0.70531401 0.71219512 0.73 0.71921182
0.73366834 0.75490196 0.73366834 0.71568627]
mean value: 0.7244969828653581
key: test_precision
value: [0.66666667 0.44444444 0.5 0.7 0.69230769 0.76923077
0.63636364 0.6 0.52941176 0.875 ]
mean value: 0.6413424973719091
key: train_precision
value: [0.66972477 0.66964286 0.65178571 0.66363636 0.6952381 0.66972477
0.6952381 0.7 0.6952381 0.66363636]
mean value: 0.6773865125699988
key: test_recall
value: [0.6 0.4 0.8 0.7 0.9 0.90909091
0.63636364 0.81818182 0.81818182 0.63636364]
mean value: 0.7218181818181818
key: train_recall
value: [0.76842105 0.78947368 0.76842105 0.76842105 0.76842105 0.77659574
0.77659574 0.81914894 0.77659574 0.77659574]
mean value: 0.7788689809630459
key: test_roc_auc
value: [0.66363636 0.47272727 0.53636364 0.71363636 0.76818182 0.80454545
0.61818182 0.60909091 0.50909091 0.76818182]
mean value: 0.6463636363636364
key: train_roc_auc
value: [0.69272116 0.69792833 0.67676372 0.68740202 0.71399776 0.69882419
0.71987682 0.73589026 0.71987682 0.69356103]
mean value: 0.7036842105263158
key: test_jcc
value: [0.46153846 0.26666667 0.44444444 0.53846154 0.64285714 0.71428571
0.46666667 0.52941176 0.47368421 0.58333333]
mean value: 0.5121349943486166
key: train_jcc
value: [0.55725191 0.56818182 0.54477612 0.5530303 0.57480315 0.56153846
0.57936508 0.60629921 0.57936508 0.55725191]
mean value: 0.5681863039882344
MCC on Blind test: 0.12
Accuracy on Blind test: 0.6
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01249909 0.01643276 0.01513815 0.01626801 0.01667428 0.01826024
0.01673746 0.01813006 0.0181365 0.01722217]
mean value: 0.01654987335205078
key: score_time
value: [0.00913095 0.01083755 0.01164222 0.01167583 0.01165843 0.01161766
0.01240993 0.01225019 0.01175737 0.01168919]
mean value: 0.01146693229675293
key: test_mcc
value: [0.30914104 0.45226702 0.71562645 0.67419986 0.58630197 0.60302269
0.55161872 0.82275335 0.74795759 0.67419986]
mean value: 0.6137088554293553
key: train_mcc
value: [0.80642655 0.76291765 0.83076702 0.58655527 0.76291765 0.82445214
0.94713854 0.93736014 0.91553719 0.93837953]
mean value: 0.8312451676284541
key: test_accuracy
value: [0.61904762 0.71428571 0.85714286 0.80952381 0.76190476 0.76190476
0.76190476 0.9047619 0.85714286 0.80952381]
mean value: 0.7857142857142857
key: train_accuracy
value: [0.89417989 0.86772487 0.91005291 0.75661376 0.86772487 0.9047619
0.97354497 0.96825397 0.95767196 0.96825397]
mean value: 0.9068783068783068
key: test_fscore
value: [0.69230769 0.625 0.84210526 0.83333333 0.66666667 0.70588235
0.73684211 0.91666667 0.84210526 0.77777778]
mean value: 0.7638687121272261
key: train_fscore
value: [0.9047619 0.84848485 0.90285714 0.80508475 0.84848485 0.89411765
0.97326203 0.96875 0.95698925 0.96703297]
mean value: 0.9069825383840636
key: test_precision
value: [0.5625 0.83333333 0.88888889 0.71428571 1. 1.
0.875 0.84615385 1. 1. ]
mean value: 0.8720161782661783
key: train_precision
value: [0.82608696 1. 0.9875 0.67375887 1. 1.
0.97849462 0.94897959 0.9673913 1. ]
mean value: 0.9382211341610441
key: test_recall
value: [0.9 0.5 0.8 1. 0.5 0.54545455
0.63636364 1. 0.72727273 0.63636364]
mean value: 0.7245454545454546
key: train_recall
value: [1. 0.73684211 0.83157895 1. 0.73684211 0.80851064
0.96808511 0.9893617 0.94680851 0.93617021]
mean value: 0.8954199328107503
key: test_roc_auc
value: [0.63181818 0.70454545 0.85454545 0.81818182 0.75 0.77272727
0.76818182 0.9 0.86363636 0.81818182]
mean value: 0.7881818181818182
key: train_roc_auc
value: [0.89361702 0.86842105 0.91047032 0.75531915 0.86842105 0.90425532
0.97351624 0.96836506 0.95761478 0.96808511]
mean value: 0.9068085106382979
key: test_jcc
value: [0.52941176 0.45454545 0.72727273 0.71428571 0.5 0.54545455
0.58333333 0.84615385 0.72727273 0.63636364]
mean value: 0.6264093749387867
key: train_jcc
value: [0.82608696 0.73684211 0.82291667 0.67375887 0.73684211 0.80851064
0.94791667 0.93939394 0.91752577 0.93617021]
mean value: 0.834596392928326
MCC on Blind test: 0.49
Accuracy on Blind test: 0.73
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01460838 0.01574397 0.01721883 0.01524639 0.03197598 0.01554775
0.01560473 0.01410699 0.01534986 0.01558042]
mean value: 0.017098331451416017
key: score_time
value: [0.01211929 0.01337838 0.01466918 0.01597857 0.02657938 0.01167512
0.01169777 0.0117774 0.01175189 0.01166606]
mean value: 0.014129304885864257
key: test_mcc
value: [0.26967994 0.53300179 0.62641448 0.58630197 0.90829511 0.38924947
0.55161872 0.50874702 0.80909091 0.62641448]
mean value: 0.580881390303866
key: train_mcc
value: [0.82785245 0.48764459 0.69501809 0.53983361 0.93841972 0.41041408
0.87061974 0.48948681 0.79048128 0.63728115]
mean value: 0.668705151901458
key: test_accuracy
value: [0.61904762 0.71428571 0.80952381 0.76190476 0.95238095 0.61904762
0.76190476 0.71428571 0.9047619 0.80952381]
mean value: 0.7666666666666666
key: train_accuracy
value: [0.91005291 0.69312169 0.82539683 0.72486772 0.96825397 0.64550265
0.93121693 0.6984127 0.88888889 0.78835979]
mean value: 0.8074074074074074
key: test_fscore
value: [0.66666667 0.76923077 0.77777778 0.66666667 0.94736842 0.42857143
0.73684211 0.78571429 0.90909091 0.83333333]
mean value: 0.7521262363367627
key: train_fscore
value: [0.91625616 0.76612903 0.78980892 0.62318841 0.9673913 0.44628099
0.92571429 0.7654321 0.87719298 0.8245614 ]
mean value: 0.790195557941608
key: test_precision
value: [0.57142857 0.625 0.875 1. 1. 1.
0.875 0.64705882 0.90909091 0.76923077]
mean value: 0.8271809073279661
key: train_precision
value: [0.86111111 0.62091503 1. 1. 1. 1.
1. 0.62416107 0.97402597 0.70149254]
mean value: 0.878170572895576
key: test_recall
value: [0.8 1. 0.7 0.5 0.9 0.27272727
0.63636364 1. 0.90909091 0.90909091]
mean value: 0.7627272727272727
key: train_recall
value: [0.97894737 1. 0.65263158 0.45263158 0.93684211 0.28723404
0.86170213 0.9893617 0.79787234 1. ]
mean value: 0.7957222844344904
key: test_roc_auc
value: [0.62727273 0.72727273 0.80454545 0.75 0.95 0.63636364
0.76818182 0.7 0.90454545 0.80454545]
mean value: 0.7672727272727273
key: train_roc_auc
value: [0.90968645 0.69148936 0.82631579 0.72631579 0.96842105 0.64361702
0.93085106 0.69994401 0.88840985 0.78947368]
mean value: 0.8074524076147817
key: test_jcc
value: [0.5 0.625 0.63636364 0.5 0.9 0.27272727
0.58333333 0.64705882 0.83333333 0.71428571]
mean value: 0.6212102113572702
key: train_jcc
value: [0.84545455 0.62091503 0.65263158 0.45263158 0.93684211 0.28723404
0.86170213 0.62 0.78125 0.70149254]
mean value: 0.6760153548818377
MCC on Blind test: 0.49
Accuracy on Blind test: 0.73
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.12339163 0.15838337 0.13479781 0.10761476 0.11238742 0.11786103
0.11912537 0.12101555 0.1190908 0.16042662]
mean value: 0.12740943431854249
key: score_time
value: [0.01511359 0.02331877 0.01539397 0.01508164 0.01590371 0.01642942
0.01661968 0.01648045 0.01624656 0.02382159]
mean value: 0.017440938949584962
key: test_mcc
value: [0.80909091 0.82275335 0.52727273 0.90829511 1. 0.90909091
0.71818182 0.82572282 1. 0.90909091]
mean value: 0.8429498554008733
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9047619 0.9047619 0.76190476 0.95238095 1. 0.95238095
0.85714286 0.9047619 1. 0.95238095]
mean value: 0.919047619047619
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.9 0.88888889 0.76190476 0.94736842 1. 0.95238095
0.85714286 0.9 1. 0.95238095]
mean value: 0.9160066833751045
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9 1. 0.72727273 1. 1. 1.
0.9 1. 1. 1. ]
mean value: 0.9527272727272728
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9 0.8 0.8 0.9 1. 0.90909091
0.81818182 0.81818182 1. 0.90909091]
mean value: 0.8854545454545455
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90454545 0.9 0.76363636 0.95 1. 0.95454545
0.85909091 0.90909091 1. 0.95454545]
mean value: 0.9195454545454546
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.81818182 0.8 0.61538462 0.9 1. 0.90909091
0.75 0.81818182 1. 0.90909091]
mean value: 0.8519930069930071
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03891611 0.03142166 0.04908037 0.03901887 0.03456759 0.02894402
0.03260779 0.03758121 0.04022551 0.04435349]
mean value: 0.037671661376953124
key: score_time
value: [0.01717496 0.01689577 0.02591538 0.02410626 0.01824427 0.02229643
0.02497411 0.02035999 0.02795792 0.01740479]
mean value: 0.021532988548278807
key: test_mcc
value: [0.74161985 0.90829511 0.71818182 0.82275335 0.90829511 1.
0.80909091 1. 1. 0.90909091]
mean value: 0.881732704873914
key: train_mcc
value: [0.97905701 0.97883539 0.98947368 0.98947368 1. 0.98947251
1. 0.97905237 1. 0.98947251]
mean value: 0.9894837157705416
key: test_accuracy
value: [0.85714286 0.95238095 0.85714286 0.9047619 0.95238095 1.
0.9047619 1. 1. 0.95238095]
mean value: 0.9380952380952381
key: train_accuracy
value: [0.98941799 0.98941799 0.99470899 0.99470899 1. 0.99470899
1. 0.98941799 1. 0.99470899]
mean value: 0.9947089947089947
key: test_fscore
value: [0.82352941 0.94736842 0.85714286 0.88888889 0.94736842 1.
0.90909091 1. 1. 0.95238095]
mean value: 0.9325769861373576
key: train_fscore
value: [0.9893617 0.98947368 0.99470899 0.99470899 1. 0.99465241
1. 0.98924731 1. 0.99465241]
mean value: 0.9946805500418356
key: test_precision
value: [1. 1. 0.81818182 1. 1. 1.
0.90909091 1. 1. 1. ]
mean value: 0.9727272727272728
key: train_precision
value: [1. 0.98947368 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9989473684210526
key: test_recall
value: [0.7 0.9 0.9 0.8 0.9 1.
0.90909091 1. 1. 0.90909091]
mean value: 0.9018181818181819
key: train_recall
value: [0.97894737 0.98947368 0.98947368 0.98947368 1. 0.9893617
1. 0.9787234 1. 0.9893617 ]
mean value: 0.990481522956327
key: test_roc_auc
value: [0.85 0.95 0.85909091 0.9 0.95 1.
0.90454545 1. 1. 0.95454545]
mean value: 0.9368181818181818
key: train_roc_auc
value: [0.98947368 0.98941769 0.99473684 0.99473684 1. 0.99468085
1. 0.9893617 1. 0.99468085]
mean value: 0.9947088465845465
key: test_jcc
value: [0.7 0.9 0.75 0.8 0.9 1.
0.83333333 1. 1. 0.90909091]
mean value: 0.8792424242424243
key: train_jcc
value: [0.97894737 0.97916667 0.98947368 0.98947368 1. 0.9893617
1. 0.9787234 1. 0.9893617 ]
mean value: 0.989450821201941
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.06760859 0.46913624 0.26768494 0.08426118 0.09855819 0.09934545
0.07997847 0.09252858 0.13018012 0.09178209]
mean value: 0.14810638427734374
key: score_time
value: [0.02587819 0.01851845 0.02169013 0.02464533 0.02099586 0.02412891
0.02215743 0.02377272 0.03352404 0.0315032 ]
mean value: 0.02468142509460449
key: test_mcc
value: [0.13762047 0.62641448 0.52727273 0.52295779 0.82275335 0.52727273
0.39196475 0.82572282 0.4719399 0.44038551]
mean value: 0.5294304529705056
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.57142857 0.80952381 0.76190476 0.76190476 0.9047619 0.76190476
0.66666667 0.9047619 0.71428571 0.71428571]
mean value: 0.7571428571428571
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.52631579 0.77777778 0.76190476 0.73684211 0.88888889 0.76190476
0.58823529 0.9 0.66666667 0.7 ]
mean value: 0.7308536045997346
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.55555556 0.875 0.72727273 0.77777778 1. 0.8
0.83333333 1. 0.85714286 0.77777778]
mean value: 0.8203860028860029
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.5 0.7 0.8 0.7 0.8 0.72727273
0.45454545 0.81818182 0.54545455 0.63636364]
mean value: 0.6681818181818182
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.56818182 0.80454545 0.76363636 0.75909091 0.9 0.76363636
0.67727273 0.90909091 0.72272727 0.71818182]
mean value: 0.7586363636363637
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.35714286 0.63636364 0.61538462 0.58333333 0.8 0.61538462
0.41666667 0.81818182 0.5 0.53846154]
mean value: 0.5880919080919081
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.17
Accuracy on Blind test: 0.6
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.35438156 0.39279032 0.34204507 0.33374977 0.37280583 0.38470411
0.33873773 0.35585856 0.37735271 0.37621069]
mean value: 0.3628636360168457
key: score_time
value: [0.01215506 0.01001287 0.00964713 0.00953436 0.01329851 0.00969267
0.0101738 0.01015115 0.01490641 0.00933409]
mean value: 0.010890603065490723
key: test_mcc
value: [0.82275335 0.90829511 0.82572282 1. 1. 1.
0.80909091 1. 1. 0.90909091]
mean value: 0.9274953099463279
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9047619 0.95238095 0.9047619 1. 1. 1.
0.9047619 1. 1. 0.95238095]
mean value: 0.9619047619047619
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.94736842 0.90909091 1. 1. 1.
0.90909091 1. 1. 0.95238095]
mean value: 0.9606820080504291
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.83333333 1. 1. 1.
0.90909091 1. 1. 1. ]
mean value: 0.9742424242424242
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.9 1. 1. 1. 1.
0.90909091 1. 1. 0.90909091]
mean value: 0.9518181818181818
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9 0.95 0.90909091 1. 1. 1.
0.90454545 1. 1. 0.95454545]
mean value: 0.9618181818181818
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.9 0.83333333 1. 1. 1.
0.83333333 1. 1. 0.90909091]
mean value: 0.9275757575757576
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.05870724 0.03189158 0.03561378 0.02327633 0.02330828 0.02847648
0.0224936 0.02254534 0.03917503 0.02338982]
mean value: 0.030887746810913087
key: score_time
value: [0.01973987 0.01692939 0.01304388 0.01248932 0.01487756 0.01249003
0.01517224 0.01476288 0.0195353 0.01688313]
mean value: 0.015592360496520996
key: test_mcc
value: [0.38924947 0.46249729 0.60302269 0.53300179 0.82572282 0.74161985
0.74161985 0.90829511 0.82275335 0.74161985]
mean value: 0.6769402069598355
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61904762 0.66666667 0.76190476 0.71428571 0.9047619 0.85714286
0.85714286 0.95238095 0.9047619 0.85714286]
mean value: 0.8095238095238095
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.71428571 0.74074074 0.8 0.76923077 0.90909091 0.88
0.88 0.95652174 0.91666667 0.88 ]
mean value: 0.8446536539145235
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.55555556 0.58823529 0.66666667 0.625 0.83333333 0.78571429
0.78571429 0.91666667 0.84615385 0.78571429]
mean value: 0.7388754219636573
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.63636364 0.68181818 0.77272727 0.72727273 0.90909091 0.85
0.85 0.95 0.9 0.85 ]
mean value: 0.8127272727272727
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.55555556 0.58823529 0.66666667 0.625 0.83333333 0.78571429
0.78571429 0.91666667 0.84615385 0.78571429]
mean value: 0.7388754219636573
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.6
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.05089164 0.02581358 0.02874446 0.03665304 0.03879476 0.03609467
0.0341928 0.01681209 0.01631188 0.01475215]
mean value: 0.029906105995178223
key: score_time
value: [0.03134346 0.01229501 0.02358603 0.0211916 0.02092743 0.0194459
0.01217341 0.0122261 0.01211548 0.01222587]
mean value: 0.017753028869628908
key: test_mcc
value: [0.71562645 0.71562645 0.33636364 1. 0.80909091 0.63305416
0.55161872 0.90829511 0.82572282 0.71818182]
mean value: 0.7213580077427297
key: train_mcc
value: [0.96830907 0.93736014 0.93670891 0.9264031 0.92597156 0.94714446
0.95767077 0.95789003 0.94713854 0.93672304]
mean value: 0.9441319626080061
key: test_accuracy
value: [0.85714286 0.85714286 0.66666667 1. 0.9047619 0.80952381
0.76190476 0.95238095 0.9047619 0.85714286]
mean value: 0.8571428571428571
key: train_accuracy
value: [0.98412698 0.96825397 0.96825397 0.96296296 0.96296296 0.97354497
0.97883598 0.97883598 0.97354497 0.96825397]
mean value: 0.9719576719576719
key: test_fscore
value: [0.84210526 0.84210526 0.66666667 1. 0.9 0.8
0.73684211 0.95652174 0.9 0.85714286]
mean value: 0.8501383894518906
key: train_fscore
value: [0.98412698 0.96774194 0.96875 0.96256684 0.96335079 0.97354497
0.9787234 0.97894737 0.97326203 0.96842105]
mean value: 0.9719435380809441
key: test_precision
value: [0.88888889 0.88888889 0.63636364 1. 0.9 0.88888889
0.875 0.91666667 1. 0.9 ]
mean value: 0.8894696969696969
key: train_precision
value: [0.9893617 0.98901099 0.95876289 0.97826087 0.95833333 0.96842105
0.9787234 0.96875 0.97849462 0.95833333]
mean value: 0.9726452194511284
key: test_recall
value: [0.8 0.8 0.7 1. 0.9 0.72727273
0.63636364 1. 0.81818182 0.81818182]
mean value: 0.8200000000000001
key: train_recall
value: [0.97894737 0.94736842 0.97894737 0.94736842 0.96842105 0.9787234
0.9787234 0.9893617 0.96808511 0.9787234 ]
mean value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_sl.py:148: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_sl.py:151: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
0.9714669652855543
key: test_roc_auc
value: [0.85454545 0.85454545 0.66818182 1. 0.90454545 0.81363636
0.76818182 0.95 0.90909091 0.85909091]
mean value: 0.8581818181818182
key: train_roc_auc
value: [0.98415454 0.96836506 0.96819709 0.96304591 0.96293393 0.97357223
0.97883539 0.97889138 0.97351624 0.96830907]
mean value: 0.9719820828667414
key: test_jcc
value: [0.72727273 0.72727273 0.5 1. 0.81818182 0.66666667
0.58333333 0.91666667 0.81818182 0.75 ]
mean value: 0.7507575757575757
key: train_jcc
value: [0.96875 0.9375 0.93939394 0.92783505 0.92929293 0.94845361
0.95833333 0.95876289 0.94791667 0.93877551]
mean value: 0.9455013925282703
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.32369375 0.40332413 0.32787657 0.30810881 0.24554396 0.37028098
0.35776758 0.43476796 0.61309886 0.40438581]
mean value: 0.37888484001159667
key: score_time
value: [0.04028463 0.04569244 0.02587414 0.01976562 0.02221918 0.01939678
0.02430534 0.04026508 0.05725694 0.02162313]
mean value: 0.0316683292388916
key: test_mcc
value: [0.62641448 0.61818182 0.53935989 0.90909091 0.90829511 0.71818182
0.74795759 1. 0.80909091 0.71818182]
mean value: 0.7594754344239546
key: train_mcc
value: [0.98947251 0.95789003 0.95767077 0.94714446 0.95767077 0.94714446
0.95767077 0.94713854 0.92597984 0.93672304]
mean value: 0.9524505200298267
key: test_accuracy
value: [0.80952381 0.80952381 0.76190476 0.95238095 0.95238095 0.85714286
0.85714286 1. 0.9047619 0.85714286]
mean value: 0.8761904761904762
key: train_accuracy
value: [0.99470899 0.97883598 0.97883598 0.97354497 0.97883598 0.97354497
0.97883598 0.97354497 0.96296296 0.96825397]
mean value: 0.9761904761904762
key: test_fscore
value: [0.77777778 0.8 0.70588235 0.95238095 0.94736842 0.85714286
0.84210526 1. 0.90909091 0.85714286]
mean value: 0.8648891390687057
key: train_fscore
value: [0.9947644 0.9787234 0.97894737 0.97354497 0.97894737 0.97354497
0.9787234 0.97326203 0.96296296 0.96842105]
mean value: 0.9761841938028554
key: test_precision
value: [0.875 0.8 0.85714286 0.90909091 1. 0.9
1. 1. 0.90909091 0.9 ]
mean value: 0.9150324675324675
key: train_precision
value: [0.98958333 0.98924731 0.97894737 0.9787234 0.97894737 0.96842105
0.9787234 0.97849462 0.95789474 0.95833333]
mean value: 0.9757315936976966
key: test_recall
value: [0.7 0.8 0.6 1. 0.9 0.81818182
0.72727273 1. 0.90909091 0.81818182]
mean value: 0.8272727272727273
key: train_recall
value: [1. 0.96842105 0.97894737 0.96842105 0.97894737 0.9787234
0.9787234 0.96808511 0.96808511 0.9787234 ]
mean value: 0.9767077267637179
key: test_roc_auc
value: [0.80454545 0.80909091 0.75454545 0.95454545 0.95 0.85909091
0.86363636 1. 0.90454545 0.85909091]
mean value: 0.8759090909090909
key: train_roc_auc
value: [0.99468085 0.97889138 0.97883539 0.97357223 0.97883539 0.97357223
0.97883539 0.97351624 0.96298992 0.96830907]
mean value: 0.9762038073908175
key: test_jcc
value: [0.63636364 0.66666667 0.54545455 0.90909091 0.9 0.75
0.72727273 1. 0.83333333 0.75 ]
mean value: 0.7718181818181818
key: train_jcc
value: [0.98958333 0.95833333 0.95876289 0.94845361 0.95876289 0.94845361
0.95833333 0.94791667 0.92857143 0.93877551]
mean value: 0.9535946595132898
MCC on Blind test: 0.6
Accuracy on Blind test: 0.8
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.07115984 0.11514401 0.103935 0.06160641 0.0467484 0.14352918
0.13648367 0.05514574 0.06950617 0.10242081]
mean value: 0.09056792259216309
key: score_time
value: [0.02112889 0.01284909 0.02215528 0.02116251 0.02008915 0.0174675
0.0115118 0.03403592 0.04912376 0.01486492]
mean value: 0.022438883781433105
key: test_mcc
value: [ 0.41475753 0.54761905 0.73192505 0.41475753 0.07142857 0.73192505
0.28288947 0.38575837 0.41475753 -0.23809524]
mean value: 0.3757722933220292
key: train_mcc
value: [0.8120433 0.82904734 0.82904734 0.88144164 0.8120433 0.77888301
0.82904734 0.82958203 0.81310356 0.84732411]
mean value: 0.8261562964430988
key: test_accuracy
value: [0.69230769 0.76923077 0.84615385 0.69230769 0.53846154 0.84615385
0.61538462 0.69230769 0.69230769 0.38461538]
mean value: 0.676923076923077
key: train_accuracy
value: [0.90598291 0.91452991 0.91452991 0.94017094 0.90598291 0.88888889
0.91452991 0.91452991 0.90598291 0.92307692]
mean value: 0.9128205128205128
key: test_fscore
value: [0.71428571 0.76923077 0.85714286 0.71428571 0.5 0.83333333
0.54545455 0.75 0.66666667 0.42857143]
mean value: 0.6778971028971029
key: train_fscore
value: [0.90756303 0.91525424 0.91525424 0.94214876 0.90756303 0.8907563
0.9137931 0.91525424 0.90756303 0.92436975]
mean value: 0.9139519701693681
key: test_precision
value: [0.625 0.71428571 0.75 0.625 0.5 1.
0.75 0.66666667 0.8 0.42857143]
mean value: 0.685952380952381
key: train_precision
value: [0.9 0.91525424 0.91525424 0.91935484 0.9 0.86885246
0.9137931 0.9 0.8852459 0.90163934]
mean value: 0.9019394121652258
key: test_recall
value: [0.83333333 0.83333333 1. 0.83333333 0.5 0.71428571
0.42857143 0.85714286 0.57142857 0.42857143]
mean value: 0.7
key: train_recall
value: [0.91525424 0.91525424 0.91525424 0.96610169 0.91525424 0.9137931
0.9137931 0.93103448 0.93103448 0.94827586]
mean value: 0.9265049678550555
key: test_roc_auc
value: [0.70238095 0.77380952 0.85714286 0.70238095 0.53571429 0.85714286
0.63095238 0.67857143 0.70238095 0.38095238]
mean value: 0.6821428571428572
key: train_roc_auc
value: [0.90590298 0.91452367 0.91452367 0.9399474 0.90590298 0.88909994
0.91452367 0.91466978 0.90619521 0.92329047]
mean value: 0.9128579777907656
key: test_jcc
value: [0.55555556 0.625 0.75 0.55555556 0.33333333 0.71428571
0.375 0.6 0.5 0.27272727]
mean value: 0.5281457431457431
key: train_jcc
value: [0.83076923 0.84375 0.84375 0.890625 0.83076923 0.8030303
0.84126984 0.84375 0.83076923 0.859375 ]
mean value: 0.8417857836607837
MCC on Blind test: 0.61
Accuracy on Blind test: 0.8
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [2.14313459 2.65574384 2.69484901 2.17424965 2.27060723 2.20656586
2.04485202 2.53665352 2.39897084 2.08672571]
mean value: 2.3212352275848387
key: score_time
value: [0.01857519 0.02521801 0.01422095 0.02029347 0.03888059 0.01733184
0.02459574 0.04596186 0.0188508 0.01879811]
mean value: 0.024272656440734862
key: test_mcc
value: [0.41475753 0.54761905 0.54761905 0.41475753 0.59160798 0.85714286
0.23809524 0.53674504 0.69047619 0.09759001]
mean value: 0.49364104686851423
key: train_mcc
value: [0.93218361 1. 1. 0.88144164 1. 0.8974284
1. 1. 1. 1. ]
mean value: 0.9711053657014244
key: test_accuracy
value: [0.69230769 0.76923077 0.76923077 0.69230769 0.76923077 0.92307692
0.61538462 0.76923077 0.84615385 0.53846154]
mean value: 0.7384615384615385
key: train_accuracy
value: [0.96581197 1. 1. 0.94017094 1. 0.94871795
1. 1. 1. 1. ]
mean value: 0.9854700854700855
key: test_fscore
value: [0.71428571 0.76923077 0.76923077 0.71428571 0.66666667 0.92307692
0.61538462 0.8 0.85714286 0.5 ]
mean value: 0.7329304029304029
key: train_fscore
value: [0.96551724 1. 1. 0.94214876 1. 0.94827586
1. 1. 1. 1. ]
mean value: 0.9855941863778854
key: test_precision
value: [0.625 0.71428571 0.71428571 0.625 1. 1.
0.66666667 0.75 0.85714286 0.6 ]
mean value: 0.7552380952380953
key: train_precision
value: [0.98245614 1. 1. 0.91935484 1. 0.94827586
1. 1. 1. 1. ]
mean value: 0.985008684112952
key: test_recall
value: [0.83333333 0.83333333 0.83333333 0.83333333 0.5 0.85714286
0.57142857 0.85714286 0.85714286 0.42857143]
mean value: 0.7404761904761905
key: train_recall
value: [0.94915254 1. 1. 0.96610169 1. 0.94827586
1. 1. 1. 1. ]
mean value: 0.9863530099357101
key: test_roc_auc
value: [0.70238095 0.77380952 0.77380952 0.70238095 0.75 0.92857143
0.61904762 0.76190476 0.8452381 0.54761905]
mean value: 0.7404761904761905
key: train_roc_auc
value: [0.96595558 1. 1. 0.9399474 1. 0.9487142
1. 1. 1. 1. ]
mean value: 0.9854617182933957
key: test_jcc
value: [0.55555556 0.625 0.625 0.55555556 0.5 0.85714286
0.44444444 0.66666667 0.75 0.33333333]
mean value: 0.5912698412698413
key: train_jcc
value: [0.93333333 1. 1. 0.890625 1. 0.90163934
1. 1. 1. 1. ]
mean value: 0.9725597677595629
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01936674 0.01057959 0.01035166 0.01044059 0.0103271 0.01035023
0.01044679 0.01057243 0.01038861 0.01044059]
mean value: 0.011326432228088379
key: score_time
value: [0.01023579 0.01007533 0.01012969 0.01011801 0.01018596 0.01004577
0.0100286 0.01010728 0.01014471 0.01001549]
mean value: 0.010108661651611329
key: test_mcc
value: [ 0.09759001 0.23809524 0.38095238 0.09759001 0.38095238 0.53674504
0.23809524 0.38575837 0.22537447 -0.41475753]
mean value: 0.21663956046363453
key: train_mcc
value: [0.62939175 0.56318771 0.50572841 0.54006981 0.56027975 0.52451345
0.63808526 0.55801254 0.55654161 0.57355974]
mean value: 0.5649370021355652
key: test_accuracy
value: [0.53846154 0.61538462 0.69230769 0.53846154 0.69230769 0.76923077
0.61538462 0.69230769 0.61538462 0.30769231]
mean value: 0.6076923076923078
key: train_accuracy
value: [0.81196581 0.77777778 0.75213675 0.76923077 0.77777778 0.76068376
0.81196581 0.76923077 0.74358974 0.78632479]
mean value: 0.7760683760683761
key: test_fscore
value: [0.57142857 0.61538462 0.66666667 0.57142857 0.66666667 0.8
0.61538462 0.75 0.70588235 0.4 ]
mean value: 0.6362842059900883
key: train_fscore
value: [0.82539683 0.796875 0.76422764 0.7804878 0.79365079 0.7704918
0.828125 0.79389313 0.79166667 0.78991597]
mean value: 0.7934730632304993
key: test_precision
value: [0.5 0.57142857 0.66666667 0.5 0.66666667 0.75
0.66666667 0.66666667 0.6 0.375 ]
mean value: 0.5963095238095237
key: train_precision
value: [0.7761194 0.73913043 0.734375 0.75 0.74626866 0.734375
0.75714286 0.71232877 0.6627907 0.7704918 ]
mean value: 0.7383022619703353
key: test_recall
value: [0.66666667 0.66666667 0.66666667 0.66666667 0.66666667 0.85714286
0.57142857 0.85714286 0.85714286 0.42857143]
mean value: 0.6904761904761905
key: train_recall
value: [0.88135593 0.86440678 0.79661017 0.81355932 0.84745763 0.81034483
0.9137931 0.89655172 0.98275862 0.81034483]
mean value: 0.8617182933956751
key: test_roc_auc
value: [0.54761905 0.61904762 0.69047619 0.54761905 0.69047619 0.76190476
0.61904762 0.67857143 0.5952381 0.29761905]
mean value: 0.6047619047619048
key: train_roc_auc
value: [0.81136762 0.77703098 0.75175336 0.76884863 0.77717709 0.76110462
0.81282876 0.77030976 0.7456166 0.78652835]
mean value: 0.7762565751022794
key: test_jcc
value: [0.4 0.44444444 0.5 0.4 0.5 0.66666667
0.44444444 0.6 0.54545455 0.25 ]
mean value: 0.4751010101010101
key: train_jcc
value: [0.7027027 0.66233766 0.61842105 0.64 0.65789474 0.62666667
0.70666667 0.65822785 0.65517241 0.65277778]
mean value: 0.6580867527519529
MCC on Blind test: 0.29
Accuracy on Blind test: 0.67
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01078939 0.01074529 0.01073861 0.01068044 0.01072288 0.01063657
0.01069188 0.01077175 0.01050353 0.01064897]
mean value: 0.010692930221557618
key: score_time
value: [0.01027346 0.0101223 0.01015067 0.01009154 0.01016331 0.01005459
0.01008916 0.01028109 0.01001859 0.01019144]
mean value: 0.010143613815307618
key: test_mcc
value: [-0.05143445 0.54761905 0.53674504 -0.07142857 -0.28288947 0.54761905
0.28288947 0.38095238 0.38095238 -0.38575837]
mean value: 0.18852665009433853
key: train_mcc
value: [0.5393392 0.54074089 0.55597781 0.58971362 0.59133581 0.61080452
0.6087526 0.59794138 0.56027975 0.64168717]
mean value: 0.5836572739419282
key: test_accuracy
value: [0.46153846 0.76923077 0.76923077 0.46153846 0.38461538 0.76923077
0.61538462 0.69230769 0.69230769 0.30769231]
mean value: 0.5923076923076923
key: train_accuracy
value: [0.76923077 0.76923077 0.77777778 0.79487179 0.79487179 0.8034188
0.8034188 0.79487179 0.77777778 0.82051282]
mean value: 0.7905982905982906
key: test_fscore
value: [0.53333333 0.76923077 0.72727273 0.46153846 0.2 0.76923077
0.54545455 0.71428571 0.71428571 0.18181818]
mean value: 0.5616450216450216
key: train_fscore
value: [0.76521739 0.76106195 0.77586207 0.79661017 0.78947368 0.78899083
0.79279279 0.77358491 0.75925926 0.81415929]
mean value: 0.7817012336310473
key: test_precision
value: [0.44444444 0.71428571 0.8 0.42857143 0.25 0.83333333
0.75 0.71428571 0.71428571 0.25 ]
mean value: 0.589920634920635
key: train_precision
value: [0.78571429 0.7962963 0.78947368 0.79661017 0.81818182 0.84313725
0.83018868 0.85416667 0.82 0.83636364]
mean value: 0.8170132491071999
key: test_recall
value: [0.66666667 0.83333333 0.66666667 0.5 0.16666667 0.71428571
0.42857143 0.71428571 0.71428571 0.14285714]
mean value: 0.5547619047619048
key: train_recall
value: [0.74576271 0.72881356 0.76271186 0.79661017 0.76271186 0.74137931
0.75862069 0.70689655 0.70689655 0.79310345]
mean value: 0.7503506721215664
key: test_roc_auc
value: [0.47619048 0.77380952 0.76190476 0.46428571 0.36904762 0.77380952
0.63095238 0.69047619 0.69047619 0.32142857]
mean value: 0.5952380952380952
key: train_roc_auc
value: [0.76943308 0.76957919 0.77790766 0.79485681 0.79514904 0.80289305
0.80303916 0.79412624 0.77717709 0.82028054]
mean value: 0.7904441846873174
key: test_jcc
value: [0.36363636 0.625 0.57142857 0.3 0.11111111 0.625
0.375 0.55555556 0.55555556 0.1 ]
mean value: 0.4182287157287157
key: train_jcc
value: [0.61971831 0.61428571 0.63380282 0.66197183 0.65217391 0.65151515
0.65671642 0.63076923 0.6119403 0.68656716]
mean value: 0.6419460847957068
MCC on Blind test: 0.17
Accuracy on Blind test: 0.6
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01046824 0.01135015 0.01026702 0.01008916 0.01003551 0.0104847
0.01012325 0.01084805 0.010113 0.01030326]
mean value: 0.010408234596252442
key: score_time
value: [0.01920247 0.0241816 0.01840687 0.02027678 0.01942277 0.01908255
0.02879333 0.0157578 0.01938295 0.0225172 ]
mean value: 0.020702433586120606
key: test_mcc
value: [ 0.38095238 0.38095238 0.05143445 0.54761905 -0.23809524 -0.05143445
-0.54761905 0.21957752 0.50709255 -0.7200823 ]
mean value: 0.053039729323695814
key: train_mcc
value: [0.38893486 0.31846508 0.49235618 0.42340863 0.38583198 0.37313533
0.47043398 0.38607028 0.39185302 0.39185302]
mean value: 0.4022342373708152
key: test_accuracy
value: [0.69230769 0.69230769 0.53846154 0.76923077 0.38461538 0.46153846
0.23076923 0.61538462 0.69230769 0.15384615]
mean value: 0.5230769230769231
key: train_accuracy
value: [0.69230769 0.65811966 0.74358974 0.70940171 0.69230769 0.68376068
0.73504274 0.69230769 0.69230769 0.69230769]
mean value: 0.6991452991452991
key: test_fscore
value: [0.66666667 0.66666667 0.4 0.76923077 0.33333333 0.36363636
0.28571429 0.66666667 0.6 0. ]
mean value: 0.4751914751914752
key: train_fscore
value: [0.67272727 0.64285714 0.72727273 0.69090909 0.68421053 0.64761905
0.72566372 0.67272727 0.65384615 0.65384615]
mean value: 0.677167910493481
key: test_precision
value: [0.66666667 0.66666667 0.5 0.71428571 0.33333333 0.5
0.28571429 0.625 1. 0. ]
mean value: 0.5291666666666667
key: train_precision
value: [0.7254902 0.67924528 0.78431373 0.74509804 0.70909091 0.72340426
0.74545455 0.71153846 0.73913043 0.73913043]
mean value: 0.7301896284771464
key: test_recall
value: [0.66666667 0.66666667 0.33333333 0.83333333 0.33333333 0.28571429
0.28571429 0.71428571 0.42857143 0. ]
mean value: 0.45476190476190476
key: train_recall
value: [0.62711864 0.61016949 0.6779661 0.6440678 0.66101695 0.5862069
0.70689655 0.63793103 0.5862069 0.5862069 ]
mean value: 0.6323787258912916
key: test_roc_auc
value: [0.69047619 0.69047619 0.52380952 0.77380952 0.38095238 0.47619048
0.22619048 0.60714286 0.71428571 0.16666667]
mean value: 0.525
key: train_roc_auc
value: [0.69286967 0.65853302 0.74415546 0.70996493 0.69257744 0.68293396
0.73480421 0.69184687 0.69140853 0.69140853]
mean value: 0.6990502630040912
key: test_jcc
value: [0.5 0.5 0.25 0.625 0.2 0.22222222
0.16666667 0.5 0.42857143 0. ]
mean value: 0.33924603174603174
key: train_jcc
value: [0.50684932 0.47368421 0.57142857 0.52777778 0.52 0.47887324
0.56944444 0.50684932 0.48571429 0.48571429]
mean value: 0.5126335445179286
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01252365 0.01188517 0.01209474 0.01199675 0.01207352 0.01212645
0.01216841 0.01206255 0.01212716 0.01190042]
mean value: 0.012095880508422852
key: score_time
value: [0.01078773 0.01077604 0.01067948 0.01055169 0.01071 0.01069093
0.01049042 0.01056886 0.01054859 0.01059175]
mean value: 0.010639548301696777
key: test_mcc
value: [ 0.23809524 0.53674504 0.54761905 0.41475753 -0.23809524 0.54761905
0.14085904 0.53674504 0.41475753 -0.38095238]
mean value: 0.2758149898990107
key: train_mcc
value: [0.66472504 0.64361355 0.64361355 0.72698045 0.71044192 0.71177678
0.79485681 0.67593781 0.693731 0.67743539]
mean value: 0.6943112296352275
key: test_accuracy
value: [0.61538462 0.76923077 0.76923077 0.69230769 0.38461538 0.76923077
0.53846154 0.76923077 0.69230769 0.30769231]
mean value: 0.6307692307692307
key: train_accuracy
value: [0.82905983 0.82051282 0.82051282 0.86324786 0.85470085 0.85470085
0.8974359 0.83760684 0.84615385 0.83760684]
mean value: 0.8461538461538461
key: test_fscore
value: [0.61538462 0.72727273 0.76923077 0.71428571 0.33333333 0.76923077
0.4 0.8 0.66666667 0.30769231]
mean value: 0.6103096903096903
key: train_fscore
value: [0.81818182 0.81415929 0.81415929 0.86206897 0.85217391 0.84684685
0.89655172 0.83185841 0.83928571 0.82882883]
mean value: 0.8404114801992302
key: test_precision
value: [0.57142857 0.8 0.71428571 0.625 0.33333333 0.83333333
0.66666667 0.75 0.8 0.33333333]
mean value: 0.6427380952380952
key: train_precision
value: [0.88235294 0.85185185 0.85185185 0.87719298 0.875 0.88679245
0.89655172 0.85454545 0.87037037 0.86792453]
mean value: 0.8714434157522146
key: test_recall
value: [0.66666667 0.66666667 0.83333333 0.83333333 0.33333333 0.71428571
0.28571429 0.85714286 0.57142857 0.28571429]
mean value: 0.6047619047619047
key: train_recall
value: [0.76271186 0.77966102 0.77966102 0.84745763 0.83050847 0.81034483
0.89655172 0.81034483 0.81034483 0.79310345]
mean value: 0.8120689655172414
key: test_roc_auc
value: [0.61904762 0.76190476 0.77380952 0.70238095 0.38095238 0.77380952
0.55952381 0.76190476 0.70238095 0.30952381]
mean value: 0.6345238095238096
key: train_roc_auc
value: [0.82963179 0.82086499 0.82086499 0.86338399 0.85490941 0.85432496
0.8974284 0.8373758 0.84585038 0.83722969]
mean value: 0.8461864406779661
key: test_jcc
value: [0.44444444 0.57142857 0.625 0.55555556 0.2 0.625
0.25 0.66666667 0.5 0.18181818]
mean value: 0.461991341991342
key: train_jcc
value: [0.69230769 0.68656716 0.68656716 0.75757576 0.74242424 0.734375
0.8125 0.71212121 0.72307692 0.70769231]
mean value: 0.7255207463556345
MCC on Blind test: 0.29
Accuracy on Blind test: 0.67
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.92927122 1.5305016 1.36941218 1.77340817 1.98329568 1.87210155
2.13366818 1.67377758 1.89807558 2.13494849]
mean value: 1.7298460245132445
key: score_time
value: [0.02280736 0.01201725 0.01104426 0.0204463 0.02194667 0.02331734
0.02230024 0.03766036 0.01530027 0.01807332]
mean value: 0.02049133777618408
key: test_mcc
value: [ 0.28288947 0.54761905 0.38095238 0.09759001 -0.07142857 0.73192505
-0.21957752 0.21957752 0.85714286 -0.41475753]
mean value: 0.24119327202193427
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61538462 0.76923077 0.69230769 0.53846154 0.46153846 0.84615385
0.38461538 0.61538462 0.92307692 0.30769231]
mean value: 0.6153846153846154
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.76923077 0.66666667 0.57142857 0.46153846 0.83333333
0.33333333 0.66666667 0.92307692 0.4 ]
mean value: 0.6291941391941391
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.55555556 0.71428571 0.66666667 0.5 0.42857143 1.
0.4 0.625 1. 0.375 ]
mean value: 0.6265079365079365
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.83333333 0.66666667 0.66666667 0.5 0.71428571
0.28571429 0.71428571 0.85714286 0.42857143]
mean value: 0.65
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.63095238 0.77380952 0.69047619 0.54761905 0.46428571 0.85714286
0.39285714 0.60714286 0.92857143 0.29761905]
mean value: 0.6190476190476191
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.625 0.5 0.4 0.3 0.71428571
0.2 0.5 0.85714286 0.25 ]
mean value: 0.48464285714285715
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.17
Accuracy on Blind test: 0.6
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02312112 0.02405119 0.02581382 0.03118682 0.04043317 0.035918
0.03535628 0.03518629 0.01530671 0.02257276]
mean value: 0.028894615173339844
key: score_time
value: [0.02141547 0.03261065 0.03187895 0.02285576 0.02194715 0.02182031
0.02029324 0.02223802 0.01271105 0.01209235]
mean value: 0.02198629379272461
key: test_mcc
value: [0.23809524 0.6172134 0.85391256 0.85714286 0.69047619 1.
1. 0.85714286 0.85714286 0.73192505]
mean value: 0.7703051018389734
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61538462 0.76923077 0.92307692 0.92307692 0.84615385 1.
1. 0.92307692 0.92307692 0.84615385]
mean value: 0.8769230769230769
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.61538462 0.8 0.90909091 0.92307692 0.83333333 1.
1. 0.92307692 0.92307692 0.83333333]
mean value: 0.876037296037296
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.57142857 0.66666667 1. 0.85714286 0.83333333 1.
1. 1. 1. 1. ]
mean value: 0.8928571428571428
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 1. 0.83333333 1. 0.83333333 1.
1. 0.85714286 0.85714286 0.71428571]
mean value: 0.8761904761904762
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.61904762 0.78571429 0.91666667 0.92857143 0.8452381 1.
1. 0.92857143 0.92857143 0.85714286]
mean value: 0.8809523809523809
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.44444444 0.66666667 0.83333333 0.85714286 0.71428571 1.
1. 0.85714286 0.85714286 0.71428571]
mean value: 0.7944444444444444
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.76
Accuracy on Blind test: 0.87
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.08921742 0.08900595 0.09549665 0.12468243 0.12501621 0.12542987
0.11990476 0.1254077 0.12537408 0.09747744]
mean value: 0.11170125007629395
key: score_time
value: [0.01770282 0.01791883 0.0226357 0.02327061 0.02341628 0.02357483
0.02346158 0.02348065 0.02347827 0.01771784]
mean value: 0.021665740013122558
key: test_mcc
value: [ 0.09759001 0.54761905 0.21957752 0.41475753 -0.09759001 1.
0.14085904 0.54761905 0.6172134 0.09759001]
mean value: 0.3585235592252616
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.53846154 0.76923077 0.61538462 0.69230769 0.46153846 1.
0.53846154 0.76923077 0.76923077 0.53846154]
mean value: 0.6692307692307693
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.57142857 0.76923077 0.54545455 0.71428571 0.36363636 1.
0.4 0.76923077 0.72727273 0.5 ]
mean value: 0.636053946053946
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.5 0.71428571 0.6 0.625 0.4 1.
0.66666667 0.83333333 1. 0.6 ]
mean value: 0.6939285714285715
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.83333333 0.5 0.83333333 0.33333333 1.
0.28571429 0.71428571 0.57142857 0.42857143]
mean value: 0.6166666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.54761905 0.77380952 0.60714286 0.70238095 0.45238095 1.
0.55952381 0.77380952 0.78571429 0.54761905]
mean value: 0.675
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.4 0.625 0.375 0.55555556 0.22222222 1.
0.25 0.625 0.57142857 0.33333333]
mean value: 0.4957539682539682
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00922561 0.01216745 0.00913382 0.00917363 0.00927591 0.00904441
0.00901008 0.00905704 0.00984097 0.0092206 ]
mean value: 0.009514951705932617
key: score_time
value: [0.00917149 0.01156068 0.00884557 0.00909114 0.00899363 0.00888038
0.00885415 0.00877523 0.00928211 0.00903535]
mean value: 0.009248971939086914
key: test_mcc
value: [ 0.28288947 0.38095238 0.05143445 0.28288947 -0.41475753 0.59160798
0.09759001 0.38095238 0.09759001 -0.53674504]
mean value: 0.12144035835279779
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61538462 0.69230769 0.53846154 0.61538462 0.30769231 0.76923077
0.53846154 0.69230769 0.53846154 0.23076923]
mean value: 0.5538461538461539
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.66666667 0.4 0.66666667 0.18181818 0.82352941
0.5 0.71428571 0.5 0.16666667]
mean value: 0.5286299974535269
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.55555556 0.66666667 0.5 0.55555556 0.2 0.7
0.6 0.71428571 0.6 0.2 ]
mean value: 0.5292063492063492
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.66666667 0.33333333 0.83333333 0.16666667 1.
0.42857143 0.71428571 0.42857143 0.14285714]
mean value: 0.5547619047619048
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.63095238 0.69047619 0.52380952 0.63095238 0.29761905 0.75
0.54761905 0.69047619 0.54761905 0.23809524]
mean value: 0.5547619047619048
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.5 0.25 0.5 0.1 0.7
0.33333333 0.55555556 0.33333333 0.09090909]
mean value: 0.38631313131313133
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.29
Accuracy on Blind test: 0.4
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.12453771 1.23042583 1.6623981 1.80660892 1.10945749 1.1398468
1.11876416 1.10889626 1.09741282 2.23863149]
mean value: 1.3636979579925537
key: score_time
value: [0.0901432 0.105124 0.27335095 0.0891552 0.09121609 0.09351397
0.08874822 0.08856773 0.08960176 0.21340656]
mean value: 0.1222827672958374
key: test_mcc
value: [0.38095238 0.6172134 0.53674504 0.41475753 0.38575837 1.
0.39477102 0.73192505 0.28288947 0.28288947]
mean value: 0.5027901748379063
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.69230769 0.76923077 0.76923077 0.69230769 0.69230769 1.
0.61538462 0.84615385 0.61538462 0.61538462]
mean value: 0.7307692307692308
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.8 0.72727273 0.71428571 0.6 1.
0.44444444 0.83333333 0.54545455 0.54545455]
mean value: 0.6876911976911977
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.66666667 0.8 0.625 0.75 1.
1. 1. 0.75 0.75 ]
mean value: 0.8008333333333333
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 1. 0.66666667 0.83333333 0.5 1.
0.28571429 0.71428571 0.42857143 0.42857143]
mean value: 0.6523809523809524
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.69047619 0.78571429 0.76190476 0.70238095 0.67857143 1.
0.64285714 0.85714286 0.63095238 0.63095238]
mean value: 0.7380952380952381
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.5 0.66666667 0.57142857 0.55555556 0.42857143 1.
0.28571429 0.71428571 0.375 0.375 ]
mean value: 0.5472222222222222
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [2.01023936 1.5312047 1.59749556 1.66958928 1.76442599 1.60279465
1.98377275 1.63427329 1.65822673 1.49137998]
mean value: 1.6943402290344238
key: score_time
value: [0.21109867 0.20521808 0.27385473 0.18928909 0.19972277 0.15325022
0.16408372 0.16386247 0.17173505 0.19090343]
mean value: 0.1923018217086792
key: test_mcc
value: [0.23809524 0.6172134 0.69047619 0.6172134 0.38095238 1.
0.39477102 0.85391256 0.41475753 0.09759001]
mean value: 0.5304981728324353
key: train_mcc
value: [0.94994292 0.94994292 0.96636481 0.98304594 0.96636481 0.93384219
0.98305085 0.94998574 0.93161894 0.94998574]
mean value: 0.9564144838510643
key: test_accuracy
value: [0.61538462 0.76923077 0.84615385 0.76923077 0.69230769 1.
0.61538462 0.92307692 0.69230769 0.53846154]
mean value: 0.7461538461538462
key: train_accuracy
value: [0.97435897 0.97435897 0.98290598 0.99145299 0.98290598 0.96581197
0.99145299 0.97435897 0.96581197 0.97435897]
mean value: 0.9777777777777777
key: test_fscore
value: [0.61538462 0.8 0.83333333 0.8 0.66666667 1.
0.44444444 0.93333333 0.66666667 0.5 ]
mean value: 0.7259829059829059
key: train_fscore
value: [0.97520661 0.97520661 0.98333333 0.99159664 0.98333333 0.96666667
0.99145299 0.97478992 0.96551724 0.97478992]
mean value: 0.9781893259894366
key: test_precision
value: [0.57142857 0.66666667 0.83333333 0.66666667 0.66666667 1.
1. 0.875 0.8 0.6 ]
mean value: 0.7679761904761905
key: train_precision
value: [0.9516129 0.9516129 0.96721311 0.98333333 0.96721311 0.93548387
0.98305085 0.95081967 0.96551724 0.95081967]
mean value: 0.9606676673360117
key: test_recall
value: [0.66666667 1. 0.83333333 1. 0.66666667 1.
0.28571429 1. 0.57142857 0.42857143]
mean value: 0.7452380952380953
key: train_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.96551724 1. ]
mean value: 0.996551724137931
key: test_roc_auc
value: [0.61904762 0.78571429 0.8452381 0.78571429 0.69047619 1.
0.64285714 0.91666667 0.70238095 0.54761905]
mean value: 0.7535714285714286
key: train_roc_auc
value: [0.97413793 0.97413793 0.98275862 0.99137931 0.98275862 0.96610169
0.99152542 0.97457627 0.96580947 0.97457627]
mean value: 0.9777761542957335
key: test_jcc
value: [0.44444444 0.66666667 0.71428571 0.66666667 0.5 1.
0.28571429 0.875 0.5 0.33333333]
mean value: 0.5986111111111111
key: train_jcc
value: [0.9516129 0.9516129 0.96721311 0.98333333 0.96721311 0.93548387
0.98305085 0.95081967 0.93333333 0.95081967]
mean value: 0.957449276531414
MCC on Blind test: 0.58
Accuracy on Blind test: 0.8
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0101862 0.00909281 0.0090611 0.0091393 0.00983238 0.00917673
0.0095005 0.0091691 0.00907755 0.00950789]
mean value: 0.009374356269836426
key: score_time
value: [0.00923538 0.00892949 0.00899577 0.0089128 0.00890374 0.00899363
0.00957346 0.00951457 0.00898981 0.00949144]
mean value: 0.009154009819030761
key: test_mcc
value: [-0.05143445 0.54761905 0.53674504 -0.07142857 -0.28288947 0.54761905
0.28288947 0.38095238 0.38095238 -0.38575837]
mean value: 0.18852665009433853
key: train_mcc
value: [0.5393392 0.54074089 0.55597781 0.58971362 0.59133581 0.61080452
0.6087526 0.59794138 0.56027975 0.64168717]
mean value: 0.5836572739419282
key: test_accuracy
value: [0.46153846 0.76923077 0.76923077 0.46153846 0.38461538 0.76923077
0.61538462 0.69230769 0.69230769 0.30769231]
mean value: 0.5923076923076923
key: train_accuracy
value: [0.76923077 0.76923077 0.77777778 0.79487179 0.79487179 0.8034188
0.8034188 0.79487179 0.77777778 0.82051282]
mean value: 0.7905982905982906
key: test_fscore
value: [0.53333333 0.76923077 0.72727273 0.46153846 0.2 0.76923077
0.54545455 0.71428571 0.71428571 0.18181818]
mean value: 0.5616450216450216
key: train_fscore
value: [0.76521739 0.76106195 0.77586207 0.79661017 0.78947368 0.78899083
0.79279279 0.77358491 0.75925926 0.81415929]
mean value: 0.7817012336310473
key: test_precision
value: [0.44444444 0.71428571 0.8 0.42857143 0.25 0.83333333
0.75 0.71428571 0.71428571 0.25 ]
mean value: 0.589920634920635
key: train_precision
value: [0.78571429 0.7962963 0.78947368 0.79661017 0.81818182 0.84313725
0.83018868 0.85416667 0.82 0.83636364]
mean value: 0.8170132491071999
key: test_recall
value: [0.66666667 0.83333333 0.66666667 0.5 0.16666667 0.71428571
0.42857143 0.71428571 0.71428571 0.14285714]
mean value: 0.5547619047619048
key: train_recall
value: [0.74576271 0.72881356 0.76271186 0.79661017 0.76271186 0.74137931
0.75862069 0.70689655 0.70689655 0.79310345]
mean value: 0.7503506721215664
key: test_roc_auc
value: [0.47619048 0.77380952 0.76190476 0.46428571 0.36904762 0.77380952
0.63095238 0.69047619 0.69047619 0.32142857]
mean value: 0.5952380952380952
key: train_roc_auc
value: [0.76943308 0.76957919 0.77790766 0.79485681 0.79514904 0.80289305
0.80303916 0.79412624 0.77717709 0.82028054]
mean value: 0.7904441846873174
key: test_jcc
value: [0.36363636 0.625 0.57142857 0.3 0.11111111 0.625
0.375 0.55555556 0.55555556 0.1 ]
mean value: 0.4182287157287157
key: train_jcc
value: [0.61971831 0.61428571 0.63380282 0.66197183 0.65217391 0.65151515
0.65671642 0.63076923 0.6119403 0.68656716]
mean value: 0.6419460847957068
MCC on Blind test: 0.17
Accuracy on Blind test: 0.6
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [3.79892635 4.06916142 3.78275681 3.70944476 3.7265656 3.06774664
1.29508638 1.30997825 1.30333805 1.31682229]
mean value: 2.737982654571533
key: score_time
value: [0.03303289 0.02331901 0.03021288 0.01799679 0.02469015 0.01262164
0.01303506 0.01236606 0.0128386 0.01293612]
mean value: 0.019304919242858886
key: test_mcc
value: [0.23809524 0.73192505 1. 0.73192505 0.85714286 1.
0.73192505 1. 0.85714286 1. ]
mean value: 0.8148156116515152
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61538462 0.84615385 1. 0.84615385 0.92307692 1.
0.84615385 1. 0.92307692 1. ]
mean value: 0.9
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.61538462 0.85714286 1. 0.85714286 0.92307692 1.
0.83333333 1. 0.92307692 1. ]
mean value: 0.9009157509157508
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.57142857 0.75 1. 0.75 0.85714286 1.
1. 1. 1. 1. ]
mean value: 0.8928571428571428
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 1. 1. 1. 1. 1.
0.71428571 1. 0.85714286 1. ]
mean value: 0.9238095238095239
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.61904762 0.85714286 1. 0.85714286 0.92857143 1.
0.85714286 1. 0.92857143 1. ]
mean value: 0.9047619047619048
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.44444444 0.75 1. 0.75 0.85714286 1.
0.71428571 1. 0.85714286 1. ]
mean value: 0.8373015873015873
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.02560902 0.0465641 0.0467689 0.04626966 0.04703498 0.03659153
0.07281756 0.05043769 0.05987668 0.03430367]
mean value: 0.046627378463745116
key: score_time
value: [0.02470279 0.02194142 0.02393842 0.02235103 0.01229 0.0124898
0.02689171 0.02636933 0.01218748 0.01236296]
mean value: 0.01955249309539795
key: test_mcc
value: [ 0.28288947 0.23809524 -0.07142857 -0.09759001 0.23809524 -0.05143445
0.28288947 0.21957752 0.23809524 0.07142857]
mean value: 0.13506177232779207
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61538462 0.61538462 0.46153846 0.46153846 0.61538462 0.46153846
0.61538462 0.61538462 0.61538462 0.53846154]
mean value: 0.5615384615384615
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.61538462 0.46153846 0.36363636 0.61538462 0.36363636
0.54545455 0.66666667 0.61538462 0.57142857]
mean value: 0.5485181485181485
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.55555556 0.57142857 0.42857143 0.4 0.57142857 0.5
0.75 0.625 0.66666667 0.57142857]
mean value: 0.5640079365079365
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83333333 0.66666667 0.5 0.33333333 0.66666667 0.28571429
0.42857143 0.71428571 0.57142857 0.57142857]
mean value: 0.5571428571428572
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.63095238 0.61904762 0.46428571 0.45238095 0.61904762 0.47619048
0.63095238 0.60714286 0.61904762 0.53571429]
mean value: 0.5654761904761905
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.44444444 0.3 0.22222222 0.44444444 0.22222222
0.375 0.5 0.44444444 0.4 ]
mean value: 0.3852777777777778
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.39
Accuracy on Blind test: 0.67
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.03235841 0.00919366 0.00935245 0.0087781 0.00871897 0.00909734
0.00957131 0.0088644 0.00907183 0.00900698]
mean value: 0.01140134334564209
key: score_time
value: [0.01621103 0.00943017 0.00886726 0.00846529 0.00861454 0.00898194
0.00923371 0.00882483 0.0093236 0.00886536]
mean value: 0.009681773185729981
key: test_mcc
value: [ 0.41475753 0.23809524 0.69047619 0.09759001 -0.23809524 0.38095238
-0.07142857 0.38095238 0.54761905 -0.23809524]
mean value: 0.22028237287741703
key: train_mcc
value: [0.45433325 0.43583749 0.50423855 0.45295149 0.52149771 0.41876096
0.50511865 0.45433325 0.48858389 0.47019287]
mean value: 0.47058481235241056
key: test_accuracy
value: [0.69230769 0.61538462 0.84615385 0.53846154 0.38461538 0.69230769
0.46153846 0.69230769 0.76923077 0.38461538]
mean value: 0.6076923076923078
key: train_accuracy
value: [0.72649573 0.71794872 0.75213675 0.72649573 0.76068376 0.70940171
0.75213675 0.72649573 0.74358974 0.73504274]
mean value: 0.7350427350427351
key: test_fscore
value: [0.71428571 0.61538462 0.83333333 0.57142857 0.33333333 0.71428571
0.46153846 0.71428571 0.76923077 0.42857143]
mean value: 0.6155677655677656
key: train_fscore
value: [0.71929825 0.72268908 0.75630252 0.72881356 0.76666667 0.70689655
0.75630252 0.73333333 0.75 0.73504274]
mean value: 0.737534520935
key: test_precision
value: [0.625 0.57142857 0.83333333 0.5 0.33333333 0.71428571
0.5 0.71428571 0.83333333 0.42857143]
mean value: 0.6053571428571428
key: train_precision
value: [0.74545455 0.71666667 0.75 0.72881356 0.75409836 0.70689655
0.73770492 0.70967742 0.72580645 0.72881356]
mean value: 0.7303932032145685
key: test_recall
value: [0.83333333 0.66666667 0.83333333 0.66666667 0.33333333 0.71428571
0.42857143 0.71428571 0.71428571 0.42857143]
mean value: 0.6333333333333333
key: train_recall
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[0.69491525 0.72881356 0.76271186 0.72881356 0.77966102 0.70689655
0.77586207 0.75862069 0.77586207 0.74137931]
mean value: 0.7453535943892461
key: test_roc_auc
value: [0.70238095 0.61904762 0.8452381 0.54761905 0.38095238 0.69047619
0.46428571 0.69047619 0.77380952 0.38095238]
mean value: 0.6095238095238096
key: train_roc_auc
value: [0.72676797 0.71785506 0.75204559 0.72647575 0.76052016 0.70938048
0.75233781 0.72676797 0.74386324 0.73509643]
mean value: 0.7351110461718293
key: test_jcc
value: [0.55555556 0.44444444 0.71428571 0.4 0.2 0.55555556
0.3 0.55555556 0.625 0.27272727]
mean value: 0.46231240981240984
key: train_jcc
value: [0.56164384 0.56578947 0.60810811 0.57333333 0.62162162 0.54666667
0.60810811 0.57894737 0.6 0.58108108]
mean value: 0.5845299596640621
MCC on Blind test: 0.12
Accuracy on Blind test: 0.6
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01001859 0.01448417 0.01453733 0.01442242 0.01421142 0.01466846
0.01452231 0.01492524 0.03460717 0.01412797]
mean value: 0.016052508354187013
key: score_time
value: [0.00898385 0.01171184 0.01143146 0.01166272 0.01168752 0.01167083
0.01222038 0.01173306 0.01181865 0.01167536]
mean value: 0.011459565162658692
key: test_mcc
value: [ 0.39477102 0.59160798 0.54761905 -0.09759001 0.41475753 0.85714286
0. 0.41475753 0.39477102 -0.05143445]
mean value: 0.3466402521747625
key: train_mcc
value: [0.5256472 0.70108874 0.96580947 0.82695916 0.79157144 0.8524126
0.55242412 0.75475504 0.4300616 0.93214426]
mean value: 0.7332873650973463
key: test_accuracy
value: [0.61538462 0.76923077 0.76923077 0.46153846 0.69230769 0.92307692
0.46153846 0.69230769 0.61538462 0.46153846]
mean value: 0.6461538461538462
key: train_accuracy
value: [0.72649573 0.82905983 0.98290598 0.90598291 0.88888889 0.92307692
0.73504274 0.86324786 0.65811966 0.96581197]
mean value: 0.8478632478632478
key: test_fscore
value: [0.70588235 0.66666667 0.76923077 0.36363636 0.71428571 0.92307692
0. 0.66666667 0.44444444 0.36363636]
mean value: 0.5617526264585088
key: train_fscore
value: [0.78378378 0.79591837 0.98305085 0.89719626 0.89922481 0.92682927
0.63529412 0.84 0.47368421 0.96491228]
mean value: 0.8199893943639955
key: test_precision
value: [0.54545455 1. 0.71428571 0.4 0.625 1.
0. 0.8 1. 0.5 ]
mean value: 0.6584740259740259
key: train_precision
value: [0.65168539 1. 0.98305085 1. 0.82857143 0.87692308
1. 1. 1. 0.98214286]
mean value: 0.9322373603353417
key: test_recall
value: [1. 0.5 0.83333333 0.33333333 0.83333333 0.85714286
0. 0.57142857 0.28571429 0.28571429]
mean value: 0.55
key: train_recall
value: [0.98305085 0.66101695 0.98305085 0.81355932 0.98305085 0.98275862
0.46551724 0.72413793 0.31034483 0.94827586]
mean value: 0.7854763296317943
key: test_roc_auc
value: [0.64285714 0.75 0.77380952 0.45238095 0.70238095 0.92857143
0.5 0.70238095 0.64285714 0.47619048]
mean value: 0.6571428571428571
key: train_roc_auc
value: [0.72428404 0.83050847 0.98290473 0.90677966 0.88807715 0.9235827
0.73275862 0.86206897 0.65517241 0.96566335]
mean value: 0.8471800116890708
key: test_jcc
value: [0.54545455 0.5 0.625 0.22222222 0.55555556 0.85714286
0. 0.5 0.28571429 0.22222222]
mean value: 0.43133116883116884
key: train_jcc
value: [0.64444444 0.66101695 0.96666667 0.81355932 0.81690141 0.86363636
0.46551724 0.72413793 0.31034483 0.93220339]
mean value: 0.7198428544215129
MCC on Blind test: 0.48
Accuracy on Blind test: 0.73
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01358604 0.01421356 0.0135591 0.01319122 0.01309133 0.01321769
0.01343799 0.01352835 0.03366351 0.013412 ]
mean value: 0.015490078926086425
key: score_time
value: [0.01017308 0.01169968 0.0117135 0.0116725 0.01163697 0.01164961
0.0117476 0.01171494 0.01705313 0.01171875]
mean value: 0.012077975273132324
key: test_mcc
value: [0.03289758 0.54761905 0.7200823 0.23809524 0.23809524 0.54761905
0.09759001 0.41475753 0.54761905 0.05143445]
mean value: 0.3435809491904047
key: train_mcc
value: [0.70108874 0.96580947 0.79806402 0.79924461 0.76221784 0.8120433
0.88348376 0.82644112 0.83358601 0.75745182]
mean value: 0.8139430690182574
key: test_accuracy
value: [0.53846154 0.76923077 0.84615385 0.61538462 0.61538462 0.76923077
0.53846154 0.69230769 0.76923077 0.53846154]
mean value: 0.6692307692307693
key: train_accuracy
value: [0.82905983 0.98290598 0.88888889 0.8974359 0.87179487 0.90598291
0.94017094 0.90598291 0.91452991 0.87179487]
mean value: 0.9008547008547009
key: test_fscore
value: [0.25 0.76923077 0.8 0.61538462 0.61538462 0.76923077
0.5 0.66666667 0.76923077 0.625 ]
mean value: 0.6380128205128205
key: train_fscore
value: [0.79591837 0.98305085 0.87619048 0.89285714 0.88549618 0.90434783
0.93693694 0.8952381 0.91803279 0.88188976]
mean value: 0.8969958425985054
key: test_precision
value: [0.5 0.71428571 1. 0.57142857 0.57142857 0.83333333
0.6 0.8 0.83333333 0.55555556]
mean value: 0.697936507936508
key: train_precision
value: [1. 0.98305085 1. 0.94339623 0.80555556 0.9122807
0.98113208 1. 0.875 0.8115942 ]
mean value: 0.9312009609552911
key: test_recall
value: [0.16666667 0.83333333 0.66666667 0.66666667 0.66666667 0.71428571
0.42857143 0.57142857 0.71428571 0.71428571]
mean value: 0.6142857142857143
key: train_recall
value: [0.66101695 0.98305085 0.77966102 0.84745763 0.98305085 0.89655172
0.89655172 0.81034483 0.96551724 0.96551724]
mean value: 0.8788720046756283
key: test_roc_auc
value: [0.51190476 0.77380952 0.83333333 0.61904762 0.61904762 0.77380952
0.54761905 0.70238095 0.77380952 0.52380952]
mean value: 0.6678571428571429
key: train_roc_auc
value: [0.83050847 0.98290473 0.88983051 0.89786674 0.87083577 0.90590298
0.93980129 0.90517241 0.91496201 0.87258913]
mean value: 0.9010374050263005
key: test_jcc
value: [0.14285714 0.625 0.66666667 0.44444444 0.44444444 0.625
0.33333333 0.5 0.625 0.45454545]
mean value: 0.48612914862914863
key: train_jcc
value: [0.66101695 0.96666667 0.77966102 0.80645161 0.79452055 0.82539683
0.88135593 0.81034483 0.84848485 0.78873239]
mean value: 0.816263162165426
MCC on Blind test: 0.61
Accuracy on Blind test: 0.8
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.10403562 0.09558034 0.09651375 0.09247088 0.09357142 0.09638667
0.09789968 0.09478498 0.09305978 0.0943675 ]
mean value: 0.09586706161499023
key: score_time
value: [0.01546073 0.015697 0.01498866 0.01505232 0.01517606 0.01585627
0.01545739 0.01489782 0.01497364 0.01499128]
mean value: 0.015255117416381836
key: test_mcc
value: [0.73192505 0.85714286 0.85391256 0.73192505 0.85714286 1.
0.73192505 0.69047619 0.85391256 0.85714286]
mean value: 0.8165505053698895
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.84615385 0.92307692 0.92307692 0.84615385 0.92307692 1.
0.84615385 0.84615385 0.92307692 0.92307692]
mean value: 0.9
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.92307692 0.90909091 0.85714286 0.92307692 1.
0.83333333 0.85714286 0.93333333 0.92307692]
mean value: 0.9016416916416916
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.85714286 1. 0.75 0.85714286 1.
1. 0.85714286 0.875 1. ]
mean value: 0.8946428571428571
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.83333333 1. 1. 1.
0.71428571 0.85714286 1. 0.85714286]
mean value: 0.9261904761904762
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.85714286 0.92857143 0.91666667 0.85714286 0.92857143 1.
0.85714286 0.8452381 0.91666667 0.92857143]
mean value: 0.9035714285714286
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.85714286 0.83333333 0.75 0.85714286 1.
0.71428571 0.75 0.875 0.85714286]
mean value: 0.8244047619047619
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03330159 0.04289031 0.03018212 0.03177643 0.03046227 0.03062987
0.03917623 0.05034637 0.03564668 0.03854513]
mean value: 0.03629570007324219
key: score_time
value: [0.02010465 0.01803446 0.02383947 0.01709294 0.01847744 0.01818728
0.03815055 0.02306819 0.02303863 0.02940965]
mean value: 0.02294032573699951
key: test_mcc
value: [0.38095238 0.73192505 0.85391256 0.54761905 0.85714286 0.85391256
0.73192505 1. 0.85391256 0.73192505]
mean value: 0.7543227141338386
key: train_mcc
value: [0.96580947 1. 1. 0.96580947 1. 0.96638414
1. 0.96638414 0.96638414 0.96580947]
mean value: 0.9796580822953629
key: test_accuracy
value: [0.69230769 0.84615385 0.92307692 0.76923077 0.92307692 0.92307692
0.84615385 1. 0.92307692 0.84615385]
mean value: 0.8692307692307693
key: train_accuracy
value: [0.98290598 1. 1. 0.98290598 1. 0.98290598
1. 0.98290598 0.98290598 0.98290598]
mean value: 0.9897435897435897
key: test_fscore
value: [0.66666667 0.85714286 0.90909091 0.76923077 0.92307692 0.93333333
0.83333333 1. 0.93333333 0.83333333]
mean value: 0.8658541458541458
key: train_fscore
value: [0.98305085 1. 1. 0.98305085 1. 0.98305085
1. 0.98305085 0.98305085 0.98275862]
mean value: 0.989801285797779
key: test_precision
value: [0.66666667 0.75 1. 0.71428571 0.85714286 0.875
1. 1. 0.875 1. ]
mean value: 0.8738095238095238
key: train_precision
value: [0.98305085 1. 1. 0.98305085 1. 0.96666667
1. 0.96666667 0.96666667 0.98275862]
mean value: 0.9848860315604909
key: test_recall
value: [0.66666667 1. 0.83333333 0.83333333 1. 1.
0.71428571 1. 1. 0.71428571]
mean value: 0.8761904761904762
key: train_recall
value: [0.98305085 1. 1. 0.98305085 1. 1.
1. 1. 1. 0.98275862]
mean value: 0.9948860315604909
key: test_roc_auc
value: [0.69047619 0.85714286 0.91666667 0.77380952 0.92857143 0.91666667
0.85714286 1. 0.91666667 0.85714286]
mean value: 0.8714285714285714
key: train_roc_auc
value: [0.98290473 1. 1. 0.98290473 1. 0.98305085
1. 0.98305085 0.98305085 0.98290473]
mean value: 0.9897866744593804
key: test_jcc
value: [0.5 0.75 0.83333333 0.625 0.85714286 0.875
0.71428571 1. 0.875 0.71428571]
mean value: 0.7744047619047619
key: train_jcc
value: [0.96666667 1. 1. 0.96666667 1. 0.96666667
1. 0.96666667 0.96666667 0.96610169]
mean value: 0.9799435028248588
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.03159785 0.0388267 0.0496254 0.0527494 0.06397605 0.05230594
0.03799987 0.04411817 0.04123425 0.05326414]
mean value: 0.04656977653503418
key: score_time
value: [0.0215857 0.03370619 0.04122758 0.03210497 0.02467465 0.02374005
0.0220952 0.02108288 0.02432179 0.02440858]
mean value: 0.026894760131835938
key: test_mcc
value: [ 0.23809524 0.53674504 0.07142857 0.21957752 -0.54761905 0.54761905
0.14085904 0.09759001 0.50709255 -0.23809524]
mean value: 0.15732927305504008
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61538462 0.76923077 0.53846154 0.61538462 0.23076923 0.76923077
0.53846154 0.53846154 0.69230769 0.38461538]
mean value: 0.5692307692307692
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.61538462 0.72727273 0.5 0.54545455 0.16666667 0.76923077
0.4 0.5 0.6 0.42857143]
mean value: 0.5252580752580752
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.57142857 0.8 0.5 0.6 0.16666667 0.83333333
0.66666667 0.6 1. 0.42857143]
mean value: 0.6166666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.66666667 0.5 0.5 0.16666667 0.71428571
0.28571429 0.42857143 0.42857143 0.42857143]
mean value: 0.47857142857142854
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.61904762 0.76190476 0.53571429 0.60714286 0.22619048 0.77380952
0.55952381 0.54761905 0.71428571 0.38095238]
mean value: 0.5726190476190476
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.44444444 0.57142857 0.33333333 0.375 0.09090909 0.625
0.25 0.33333333 0.42857143 0.27272727]
mean value: 0.37247474747474746
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.27359462 0.25197363 0.24951911 0.24853635 0.2603178 0.25177169
0.26295996 0.25529766 0.24311304 0.25735235]
mean value: 0.2554436206817627
key: score_time
value: [0.00972319 0.00941014 0.00939226 0.00937867 0.00994706 0.00936031
0.00973701 0.00932145 0.0103333 0.00949168]
mean value: 0.009609508514404296
key: test_mcc
value: [0.73192505 0.73192505 0.85391256 0.73192505 0.85714286 1.
0.73192505 1. 0.85714286 0.73192505]
mean value: 0.822782355167268
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.84615385 0.84615385 0.92307692 0.84615385 0.92307692 1.
0.84615385 1. 0.92307692 0.84615385]
mean value: 0.9
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.85714286 0.90909091 0.85714286 0.92307692 1.
0.83333333 1. 0.92307692 0.83333333]
mean value: 0.8993339993339993
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.75 1. 0.75 0.85714286 1.
1. 1. 1. 1. ]
mean value: 0.9107142857142857
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.83333333 1. 1. 1.
0.71428571 1. 0.85714286 0.71428571]
mean value: 0.9119047619047619
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.85714286 0.85714286 0.91666667 0.85714286 0.92857143 1.
0.85714286 1. 0.92857143 0.85714286]
mean value: 0.905952380952381
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.75 0.83333333 0.75 0.85714286 1.
0.71428571 1. 0.85714286 0.71428571]
mean value: 0.8226190476190476
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.76
Accuracy on Blind test: 0.87
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01716733 0.01644683 0.01679516 0.01655555 0.01668715 0.0164144
0.01659274 0.01631236 0.01657629 0.01654887]
mean value: 0.016609668731689453
key: score_time
value: [0.01222205 0.01211405 0.01213098 0.01435637 0.01459837 0.01452422
0.01214623 0.0120852 0.01212072 0.01458526]
mean value: 0.013088345527648926
key: test_mcc
value: [-0.28288947 0.07142857 -0.21957752 0.09759001 -0.28288947 0.23809524
-0.22537447 0.05143445 -0.05143445 0.09759001]
mean value: -0.050602711008851185
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.38461538 0.53846154 0.38461538 0.53846154 0.38461538 0.61538462
0.38461538 0.53846154 0.46153846 0.53846154]
mean value: 0.47692307692307695
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.2 0.5 0.42857143 0.57142857 0.2 0.61538462
0.2 0.625 0.36363636 0.5 ]
mean value: 0.4204020979020979
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.25 0.5 0.375 0.5 0.25 0.66666667
0.33333333 0.55555556 0.5 0.6 ]
mean value: 0.45305555555555554
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.16666667 0.5 0.5 0.66666667 0.16666667 0.57142857
0.14285714 0.71428571 0.28571429 0.42857143]
mean value: 0.41428571428571426
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.36904762 0.53571429 0.39285714 0.54761905 0.36904762 0.61904762
0.4047619 0.52380952 0.47619048 0.54761905]
mean value: 0.4785714285714286
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.11111111 0.33333333 0.27272727 0.4 0.11111111 0.44444444
0.11111111 0.45454545 0.22222222 0.33333333]
mean value: 0.27939393939393936
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.12
Accuracy on Blind test: 0.4
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.04744411 0.03657794 0.03558445 0.03328729 0.03785634 0.03590202
0.03718376 0.03619504 0.0340116 0.02791476]
mean value: 0.0361957311630249
key: score_time
value: [0.02389693 0.0237062 0.02078629 0.02116799 0.02398634 0.023525
0.02397728 0.02406883 0.02208757 0.0217185 ]
mean value: 0.022892093658447264
key: test_mcc
value: [0.41475753 0.54761905 0.73192505 0.38095238 0.38095238 0.85714286
0.23809524 0.21957752 0.85391256 0.23809524]
mean value: 0.4863029808815056
key: train_mcc
value: [0.96580947 0.93214426 0.94884541 0.89792372 0.89792372 0.93161894
0.98305085 0.96580947 0.93161894 0.96580947]
mean value: 0.9420554222581047
key: test_accuracy
value: [0.69230769 0.76923077 0.84615385 0.69230769 0.69230769 0.92307692
0.61538462 0.61538462 0.92307692 0.61538462]
mean value: 0.7384615384615385
key: train_accuracy
value: [0.98290598 0.96581197 0.97435897 0.94871795 0.94871795 0.96581197
0.99145299 0.98290598 0.96581197 0.98290598]
mean value: 0.9709401709401709
key: test_fscore
value: [0.71428571 0.76923077 0.85714286 0.66666667 0.66666667 0.92307692
0.61538462 0.66666667 0.93333333 0.61538462]
mean value: 0.7427838827838827
key: train_fscore
value: [0.98305085 0.96666667 0.97478992 0.95 0.95 0.96551724
0.99145299 0.98275862 0.96551724 0.98275862]
mean value: 0.9712512145681603
key: test_precision
value: [0.625 0.71428571 0.75 0.66666667 0.66666667 1.
0.66666667 0.625 0.875 0.66666667]
mean value: 0.7255952380952381
key: train_precision
value: [0.98305085 0.95081967 0.96666667 0.93442623 0.93442623 0.96551724
0.98305085 0.98275862 0.96551724 0.98275862]
mean value: 0.9648992216867394
key: test_recall
value: [0.83333333 0.83333333 1. 0.66666667 0.66666667 0.85714286
0.57142857 0.71428571 1. 0.57142857]
mean value: 0.7714285714285715
key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_sl.py:168: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_sl.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.98305085 0.98305085 0.98305085 0.96610169 0.96610169 0.96551724
1. 0.98275862 0.96551724 0.98275862]
mean value: 0.9777907656341321
key: test_roc_auc
value: [0.70238095 0.77380952 0.85714286 0.69047619 0.69047619 0.92857143
0.61904762 0.60714286 0.91666667 0.61904762]
mean value: 0.7404761904761905
key: train_roc_auc
value: [0.98290473 0.96566335 0.97428404 0.94856809 0.94856809 0.96580947
0.99152542 0.98290473 0.96580947 0.98290473]
mean value: 0.9708942139099942
key: test_jcc
value: [0.55555556 0.625 0.75 0.5 0.5 0.85714286
0.44444444 0.5 0.875 0.44444444]
mean value: 0.6051587301587301
key: train_jcc
value: [0.96666667 0.93548387 0.95081967 0.9047619 0.9047619 0.93333333
0.98305085 0.96610169 0.93333333 0.96610169]
mean value: 0.9444414923244168
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.31864738 0.31613731 0.37270212 0.30933118 0.3532021 0.42899418
0.34164643 0.27218747 0.27531123 0.27682853]
mean value: 0.32649879455566405
key: score_time
value: [0.03583121 0.02314615 0.02238393 0.02144551 0.02395582 0.02265859
0.02140379 0.02456188 0.02235389 0.02389455]
mean value: 0.024163532257080077
key: test_mcc
value: [0.41475753 0.54761905 0.73192505 0.38095238 0.38095238 0.85714286
0.23809524 0.21957752 0.85391256 0.23809524]
mean value: 0.4863029808815056
key: train_mcc
value: [0.96580947 0.98304594 0.94884541 0.89792372 0.89792372 0.93161894
1. 0.96580947 0.93161894 0.96580947]
mean value: 0.9488405049573129
key: test_accuracy
value: [0.69230769 0.76923077 0.84615385 0.69230769 0.69230769 0.92307692
0.61538462 0.61538462 0.92307692 0.61538462]
mean value: 0.7384615384615385
key: train_accuracy
value: [0.98290598 0.99145299 0.97435897 0.94871795 0.94871795 0.96581197
1. 0.98290598 0.96581197 0.98290598]
mean value: 0.9743589743589743
key: test_fscore
value: [0.71428571 0.76923077 0.85714286 0.66666667 0.66666667 0.92307692
0.61538462 0.66666667 0.93333333 0.61538462]
mean value: 0.7427838827838827
key: train_fscore
value: [0.98305085 0.99159664 0.97478992 0.95 0.95 0.96551724
1. 0.98275862 0.96551724 0.98275862]
mean value: 0.9745989126217407
key: test_precision
value: [0.625 0.71428571 0.75 0.66666667 0.66666667 1.
0.66666667 0.625 0.875 0.66666667]
mean value: 0.7255952380952381
key: train_precision
value: [0.98305085 0.98333333 0.96666667 0.93442623 0.93442623 0.96551724
1. 0.98275862 0.96551724 0.98275862]
mean value: 0.9698455030611952
key: test_recall
value: [0.83333333 0.83333333 1. 0.66666667 0.66666667 0.85714286
0.57142857 0.71428571 1. 0.57142857]
mean value: 0.7714285714285715
key: train_recall
value: [0.98305085 1. 0.98305085 0.96610169 0.96610169 0.96551724
1. 0.98275862 0.96551724 0.98275862]
mean value: 0.9794856808883694
key: test_roc_auc
value: [0.70238095 0.77380952 0.85714286 0.69047619 0.69047619 0.92857143
0.61904762 0.60714286 0.91666667 0.61904762]
mean value: 0.7404761904761905
key: train_roc_auc
value: [0.98290473 0.99137931 0.97428404 0.94856809 0.94856809 0.96580947
1. 0.98290473 0.96580947 0.98290473]
mean value: 0.9743132670952659
key: test_jcc
value: [0.55555556 0.625 0.75 0.5 0.5 0.85714286
0.44444444 0.5 0.875 0.44444444]
mean value: 0.6051587301587301
key: train_jcc
value: [0.96666667 0.98333333 0.95081967 0.9047619 0.9047619 0.93333333
1. 0.96610169 0.93333333 0.96610169]
mean value: 0.9509213538152133
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03482747 0.04060721 0.06156802 0.03389192 0.033283 0.03031182
0.04303527 0.03335857 0.04897904 0.02938032]
mean value: 0.03892426490783692
key: score_time
value: [0.01205015 0.01392388 0.02358866 0.0120697 0.01431918 0.01200342
0.01424956 0.01427317 0.01218128 0.01201129]
mean value: 0.01406702995300293
key: test_mcc
value: [0.53935989 0.52295779 0.71562645 0.80909091 0.53935989 0.80909091
0.23636364 0.80909091 0.63305416 0.82572282]
mean value: 0.6439717368137134
key: train_mcc
value: [0.88405964 0.82054446 0.85264475 0.84171619 0.89500244 0.87301232
0.92597156 0.852022 0.89495572 0.90519967]
mean value: 0.8745128753545499
key: test_accuracy
value: [0.76190476 0.76190476 0.85714286 0.9047619 0.76190476 0.9047619
0.61904762 0.9047619 0.80952381 0.9047619 ]
mean value: 0.819047619047619
key: train_accuracy
value: [0.94179894 0.91005291 0.92592593 0.92063492 0.94708995 0.93650794
0.96296296 0.92592593 0.94708995 0.95238095]
mean value: 0.937037037037037
key: test_fscore
value: [0.70588235 0.73684211 0.84210526 0.9 0.70588235 0.90909091
0.63636364 0.90909091 0.8 0.9 ]
mean value: 0.8045257528848859
key: train_fscore
value: [0.94117647 0.90909091 0.92473118 0.9197861 0.94623656 0.93617021
0.96256684 0.92473118 0.94565217 0.95135135]
mean value: 0.936149298361715
key: test_precision
value: [0.85714286 0.77777778 0.88888889 0.9 0.85714286 0.90909091
0.63636364 0.90909091 0.88888889 1. ]
mean value: 0.8624386724386724
key: train_precision
value: [0.95652174 0.92391304 0.94505495 0.93478261 0.96703297 0.93617021
0.96774194 0.93478261 0.96666667 0.96703297]
mean value: 0.9499699694037375
key: test_recall
value: [0.6 0.7 0.8 0.9 0.6 0.90909091
0.63636364 0.90909091 0.72727273 0.81818182]
mean value: 0.76
key: train_recall
value: [0.92631579 0.89473684 0.90526316 0.90526316 0.92631579 0.93617021
0.95744681 0.91489362 0.92553191 0.93617021]
mean value: 0.9228107502799552
key: test_roc_auc
value: [0.75454545 0.75909091 0.85454545 0.90454545 0.75454545 0.90454545
0.61818182 0.90454545 0.81363636 0.90909091]
mean value: 0.8177272727272727
key: train_roc_auc
value: [0.9418813 0.91013438 0.92603583 0.92071669 0.94720045 0.93650616
0.96293393 0.92586786 0.94697648 0.95229563]
mean value: 0.9370548712206047
key: test_jcc
value: [0.54545455 0.58333333 0.72727273 0.81818182 0.54545455 0.83333333
0.46666667 0.83333333 0.66666667 0.81818182]
mean value: 0.6837878787878788
key: train_jcc
value: [0.88888889 0.83333333 0.86 0.85148515 0.89795918 0.88
0.92783505 0.86 0.89690722 0.90721649]
mean value: 0.8803625317297141
MCC on Blind test: 0.61
Accuracy on Blind test: 0.8
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.94342947 1.26570821 0.92522335 1.06781173 0.90099669 0.92243218
0.97618771 0.7888 1.1742022 0.89259076]
mean value: 0.9857382297515869
key: score_time
value: [0.01842904 0.01664925 0.01643705 0.016366 0.01821828 0.02390528
0.01476598 0.01480126 0.01512694 0.01856089]
mean value: 0.017325997352600098
key: test_mcc
value: [0.74161985 0.82275335 0.74161985 0.82572282 0.80909091 0.71818182
0.23636364 1. 0.90829511 0.67419986]
mean value: 0.7477847204800199
key: train_mcc
value: [1. 0.98947368 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9989473684210526
key: test_accuracy
value: [0.85714286 0.9047619 0.85714286 0.9047619 0.9047619 0.85714286
0.61904762 1. 0.95238095 0.80952381]
mean value: 0.8666666666666667
key: train_accuracy
value: [1. 0.99470899 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9994708994708995
key: test_fscore
value: [0.82352941 0.88888889 0.82352941 0.90909091 0.9 0.85714286
0.63636364 1. 0.95652174 0.77777778]
mean value: 0.8572844631923916
key: train_fscore
value: [1. 0.99470899 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9994708994708995
key: test_precision
value: [1. 1. 1. 0.83333333 0.9 0.9
0.63636364 1. 0.91666667 1. ]
mean value: 0.9186363636363637
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.7 0.8 0.7 1. 0.9 0.81818182
0.63636364 1. 1. 0.63636364]
mean value: 0.8190909090909091
key: train_recall
value: [1. 0.98947368 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9989473684210526
key: test_roc_auc
value: [0.85 0.9 0.85 0.90909091 0.90454545 0.85909091
0.61818182 1. 0.95 0.81818182]
mean value: 0.8659090909090909
key: train_roc_auc
value: [1. 0.99473684 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9994736842105263
key: test_jcc
value: [0.7 0.8 0.7 0.83333333 0.81818182 0.75
0.46666667 1. 0.91666667 0.63636364]
mean value: 0.7621212121212121
key: train_jcc
value: [1. 0.98947368 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9989473684210526
MCC on Blind test: 0.6
Accuracy on Blind test: 0.8
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.02358484 0.00938153 0.00928259 0.00879526 0.00884533 0.00888205
0.00890446 0.00898314 0.01293993 0.01311731]
mean value: 0.01127164363861084
key: score_time
value: [0.01048207 0.00914145 0.00899744 0.00867033 0.00864816 0.00866389
0.00865364 0.00863838 0.01307774 0.01077056]
mean value: 0.009574365615844727
key: test_mcc
value: [0.35527986 0.23636364 0.60302269 0.35527986 0.21968621 0.58630197
0.13858047 0.42727273 0.45226702 0.24120908]
mean value: 0.3615263506116118
key: train_mcc
value: [0.43194158 0.37741808 0.40940239 0.42563559 0.39658396 0.43701355
0.42107287 0.48299607 0.34600551 0.40559385]
mean value: 0.41336634360677543
key: test_accuracy
value: [0.66666667 0.61904762 0.76190476 0.66666667 0.57142857 0.76190476
0.57142857 0.71428571 0.71428571 0.61904762]
mean value: 0.6666666666666666
key: train_accuracy
value: [0.7037037 0.66666667 0.69312169 0.7037037 0.68783069 0.70899471
0.6984127 0.74074074 0.62962963 0.68783069]
mean value: 0.692063492063492
key: test_fscore
value: [0.69565217 0.6 0.8 0.69565217 0.66666667 0.81481481
0.66666667 0.72727273 0.76923077 0.69230769]
mean value: 0.7128263684785424
key: train_fscore
value: [0.74774775 0.73191489 0.73873874 0.74311927 0.73303167 0.74418605
0.73972603 0.74871795 0.72 0.73542601]
mean value: 0.7382608351962145
key: test_precision
value: [0.61538462 0.6 0.66666667 0.61538462 0.52941176 0.6875
0.5625 0.72727273 0.66666667 0.6 ]
mean value: 0.6270787056081174
key: train_precision
value: [0.65354331 0.61428571 0.64566929 0.65853659 0.64285714 0.66115702
0.648 0.72277228 0.57692308 0.63565891]
mean value: 0.6459403334606778
key: test_recall
value: [0.8 0.6 1. 0.8 0.9 1.
0.81818182 0.72727273 0.90909091 0.81818182]
mean value: 0.8372727272727273
key: train_recall
value: [0.87368421 0.90526316 0.86315789 0.85263158 0.85263158 0.85106383
0.86170213 0.77659574 0.95744681 0.87234043]
mean value: 0.8666517357222845
key: test_roc_auc
value: [0.67272727 0.61818182 0.77272727 0.67272727 0.58636364 0.75
0.55909091 0.71363636 0.70454545 0.60909091]
mean value: 0.6659090909090909
key: train_roc_auc
value: [0.70279955 0.66539754 0.69221725 0.70291153 0.68695409 0.70974244
0.69927212 0.74092945 0.63135498 0.68880179]
mean value: 0.6920380739081747
key: test_jcc
value: [0.53333333 0.42857143 0.66666667 0.53333333 0.5 0.6875
0.5 0.57142857 0.625 0.52941176]
mean value: 0.5575245098039215
key: train_jcc
value: [0.5971223 0.57718121 0.58571429 0.59124088 0.57857143 0.59259259
0.58695652 0.59836066 0.5625 0.58156028]
mean value: 0.5851800154167459
MCC on Blind test: 0.12
Accuracy on Blind test: 0.6
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01514578 0.0143559 0.01509309 0.01282573 0.00945067 0.00921535
0.00978017 0.00969481 0.00936055 0.00989962]
mean value: 0.011482167243957519
key: score_time
value: [0.01314116 0.01436496 0.014256 0.01244879 0.00897789 0.00877929
0.00870728 0.00949979 0.00937581 0.00883293]
mean value: 0.01083838939666748
key: test_mcc
value: [ 0.23636364 -0.06741999 0.61818182 0.53935989 -0.04545455 0.24771685
0.03739788 0.33636364 0.55161872 0.24771685]
mean value: 0.2701844747191005
key: train_mcc
value: [0.44056694 0.44988241 0.45158615 0.40571724 0.47104939 0.40741573
0.48340519 0.41808615 0.40741573 0.50382186]
mean value: 0.4438946787181437
key: test_accuracy
value: [0.61904762 0.47619048 0.80952381 0.76190476 0.47619048 0.61904762
0.52380952 0.66666667 0.76190476 0.61904762]
mean value: 0.6333333333333333
key: train_accuracy
value: [0.71957672 0.72486772 0.72486772 0.6984127 0.73544974 0.7037037
0.74074074 0.70899471 0.7037037 0.75132275]
mean value: 0.7211640211640211
key: test_fscore
value: [0.6 0.35294118 0.8 0.70588235 0.47619048 0.6
0.58333333 0.66666667 0.73684211 0.6 ]
mean value: 0.6121856110865399
key: train_fscore
value: [0.71038251 0.72340426 0.71428571 0.66666667 0.73404255 0.69892473
0.72625698 0.7027027 0.69892473 0.74033149]
mean value: 0.7115922343145447
key: test_precision
value: [0.6 0.42857143 0.8 0.85714286 0.45454545 0.66666667
0.53846154 0.7 0.875 0.66666667]
mean value: 0.6587054612054611
key: train_precision
value: [0.73863636 0.7311828 0.74712644 0.75 0.74193548 0.70652174
0.76470588 0.71428571 0.70652174 0.77011494]
mean value: 0.7371031097416126
key: test_recall
value: [0.6 0.3 0.8 0.6 0.5 0.54545455
0.63636364 0.63636364 0.63636364 0.54545455]
mean value: 0.58
key: train_recall
value: [0.68421053 0.71578947 0.68421053 0.6 0.72631579 0.69148936
0.69148936 0.69148936 0.69148936 0.71276596]
mean value: 0.6889249720044793
key: test_roc_auc
value: [0.61818182 0.46818182 0.80909091 0.75454545 0.47727273 0.62272727
0.51818182 0.66818182 0.76818182 0.62272727]
mean value: 0.6327272727272727
key: train_roc_auc
value: [0.71976484 0.72491601 0.72508399 0.69893617 0.73549832 0.70363942
0.74048152 0.70890258 0.70363942 0.75111982]
mean value: 0.7211982082866741
key: test_jcc
value: [0.42857143 0.21428571 0.66666667 0.54545455 0.3125 0.42857143
0.41176471 0.5 0.58333333 0.42857143]
mean value: 0.45197192513368983
key: train_jcc
value: [0.55084746 0.56666667 0.55555556 0.5 0.57983193 0.53719008
0.57017544 0.54166667 0.53719008 0.5877193 ]
mean value: 0.5526843181420478
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01027989 0.01051307 0.00920153 0.00923085 0.00902748 0.00923491
0.01056838 0.01076412 0.01129293 0.01068258]
mean value: 0.010079574584960938
key: score_time
value: [0.01671553 0.01708245 0.01061392 0.01051021 0.01748228 0.01556611
0.01171184 0.01431727 0.01263571 0.01222897]
mean value: 0.013886427879333496
key: test_mcc
value: [-0.14545455 0.13483997 0.23373675 0.14545455 -0.39196475 -0.24771685
-0.33709993 0.33636364 0.15894099 0.05504819]
mean value: -0.0057851993419990025
key: train_mcc
value: [0.45024663 0.42906778 0.42923006 0.46035834 0.43243527 0.49287375
0.50280155 0.43919373 0.48199732 0.43065616]
mean value: 0.45488605817766276
key: test_accuracy
value: [0.42857143 0.57142857 0.61904762 0.57142857 0.33333333 0.38095238
0.33333333 0.66666667 0.57142857 0.52380952]
mean value: 0.5
key: train_accuracy
value: [0.72486772 0.71428571 0.71428571 0.73015873 0.71428571 0.74603175
0.75132275 0.71957672 0.74074074 0.71428571]
mean value: 0.726984126984127
key: test_fscore
value: [0.4 0.47058824 0.55555556 0.57142857 0.125 0.43478261
0.22222222 0.66666667 0.52631579 0.5 ]
mean value: 0.447255964933647
key: train_fscore
value: [0.72043011 0.70967742 0.7244898 0.73015873 0.69662921 0.73626374
0.74594595 0.71957672 0.73224044 0.69662921]
mean value: 0.7212041318869982
key: test_precision
value: [0.4 0.57142857 0.625 0.54545455 0.16666667 0.41666667
0.28571429 0.7 0.625 0.55555556]
mean value: 0.48914862914862917
key: train_precision
value: [0.73626374 0.72527473 0.7029703 0.73404255 0.74698795 0.76136364
0.75824176 0.71578947 0.75280899 0.73809524]
mean value: 0.7371838358715771
key: test_recall
value: [0.4 0.4 0.5 0.6 0.1 0.45454545
0.18181818 0.63636364 0.45454545 0.45454545]
mean value: 0.41818181818181815
key: train_recall
value: [0.70526316 0.69473684 0.74736842 0.72631579 0.65263158 0.71276596
0.73404255 0.72340426 0.71276596 0.65957447]
mean value: 0.7068868980963046
key: test_roc_auc
value: [0.42727273 0.56363636 0.61363636 0.57272727 0.32272727 0.37727273
0.34090909 0.66818182 0.57727273 0.52727273]
mean value: 0.49909090909090903
key: train_roc_auc
value: [0.724972 0.7143897 0.71410974 0.73017917 0.71461366 0.74585666
0.7512318 0.71959686 0.74059351 0.71399776]
mean value: 0.7269540873460246
key: test_jcc
value: [0.25 0.30769231 0.38461538 0.4 0.06666667 0.27777778
0.125 0.5 0.35714286 0.33333333]
mean value: 0.30022283272283273
key: train_jcc
value: [0.56302521 0.55 0.568 0.575 0.53448276 0.5826087
0.59482759 0.56198347 0.57758621 0.53448276]
mean value: 0.5641996687155415
MCC on Blind test: 0.17
Accuracy on Blind test: 0.6
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline:/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01432991 0.01354694 0.01185727 0.01176095 0.01348209 0.01326251
0.01258087 0.01316094 0.01310778 0.01367092]
mean value: 0.013076019287109376
key: score_time
value: [0.01142573 0.01127124 0.00964952 0.0101819 0.01053739 0.01043487
0.01040888 0.01022005 0.01001334 0.01339149]
mean value: 0.010753440856933593
key: test_mcc
value: [ 0.42727273 -0.06741999 0.80909091 0.06741999 0.13762047 0.42727273
0.04545455 0.52295779 0.71818182 0.61818182]
mean value: 0.37060328045303614
key: train_mcc
value: [0.69399986 0.73585755 0.80972114 0.74663724 0.75702928 0.78850682
0.8636019 0.77999992 0.79896965 0.79930542]
mean value: 0.7773628790574287
key: test_accuracy
value: [0.71428571 0.47619048 0.9047619 0.52380952 0.57142857 0.71428571
0.52380952 0.76190476 0.85714286 0.80952381]
mean value: 0.6857142857142857
key: train_accuracy
value: [0.84656085 0.86772487 0.9047619 0.87301587 0.87830688 0.89417989
0.93121693 0.88888889 0.8994709 0.8994709 ]
mean value: 0.8883597883597883
key: test_fscore
value: [0.7 0.35294118 0.9 0.58333333 0.52631579 0.72727273
0.54545455 0.7826087 0.85714286 0.81818182]
mean value: 0.6793250942981728
key: train_fscore
value: [0.85128205 0.86631016 0.90425532 0.87628866 0.87700535 0.89247312
0.92896175 0.89230769 0.89839572 0.8972973 ]
mean value: 0.8884577116689765
key: test_precision
value: [0.7 0.42857143 0.9 0.5 0.55555556 0.72727273
0.54545455 0.75 0.9 0.81818182]
mean value: 0.6825036075036075
key: train_precision
value: [0.83 0.88043478 0.91397849 0.85858586 0.89130435 0.90217391
0.95505618 0.86138614 0.90322581 0.91208791]
mean value: 0.8908233433616443
key: test_recall
value: [0.7 0.3 0.9 0.7 0.5 0.72727273
0.54545455 0.81818182 0.81818182 0.81818182]
mean value: 0.6827272727272727
key: train_recall
value: [0.87368421 0.85263158 0.89473684 0.89473684 0.86315789 0.88297872
0.90425532 0.92553191 0.89361702 0.88297872]
mean value: 0.8868309070548712
key: test_roc_auc
value: [0.71363636 0.46818182 0.90454545 0.53181818 0.56818182 0.71363636
0.52272727 0.75909091 0.85909091 0.80909091]
mean value: 0.6849999999999999
key: train_roc_auc
value: [0.84641657 0.86780515 0.90481523 0.87290034 0.87838746 0.89412094
0.93107503 0.88908175 0.89944009 0.8993841 ]
mean value: 0.8883426651735722
key: test_jcc
value: [0.53846154 0.21428571 0.81818182 0.41176471 0.35714286 0.57142857
0.375 0.64285714 0.75 0.69230769]
mean value: 0.5371430040547688
key: train_jcc
value: [0.74107143 0.76415094 0.82524272 0.77981651 0.78095238 0.80582524
0.86734694 0.80555556 0.81553398 0.81372549]
mean value: 0.7999221192956221
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.33641791 1.50435567 1.61624503 1.38623667 1.66725993 1.31514406
1.19694066 1.21551633 1.49928236 1.54568815]
mean value: 1.4283086776733398
key: score_time
value: [0.01264334 0.01469707 0.02935648 0.02213526 0.01919293 0.01318979
0.01264215 0.01259017 0.01352763 0.01921105]
mean value: 0.016918587684631347
key: test_mcc
value: [0.43007562 0.53935989 0.90829511 0.80909091 0.71818182 0.4719399
0.33028913 0.80909091 0.63305416 0.67419986]
mean value: 0.6323577308640445
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.71428571 0.76190476 0.95238095 0.9047619 0.85714286 0.71428571
0.66666667 0.9047619 0.80952381 0.80952381]
mean value: 0.8095238095238095
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.70588235 0.94736842 0.9 0.85714286 0.66666667
0.69565217 0.90909091 0.8 0.77777778]
mean value: 0.7926247825251729
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[0.75 0.85714286 1. 0.9 0.81818182 0.85714286
0.66666667 0.90909091 0.88888889 1. ]
mean value: 0.8647113997113997
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6 0.6 0.9 0.9 0.9 0.54545455
0.72727273 0.90909091 0.72727273 0.63636364]
mean value: 0.7445454545454545
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.70909091 0.75454545 0.95 0.90454545 0.85909091 0.72272727
0.66363636 0.90454545 0.81363636 0.81818182]
mean value: 0.8099999999999999
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.54545455 0.9 0.81818182 0.75 0.5
0.53333333 0.83333333 0.66666667 0.63636364]
mean value: 0.6683333333333333
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.6
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01775002 0.01362348 0.01367092 0.01293755 0.01265097 0.01258326
0.01258826 0.01409173 0.01307774 0.01271462]
mean value: 0.013568854331970215
key: score_time
value: [0.01175952 0.00916195 0.00893497 0.00862074 0.00865817 0.00863695
0.0087688 0.00872326 0.00884724 0.00894094]
mean value: 0.009105253219604491
key: test_mcc
value: [0.82275335 0.82275335 1. 0.90829511 1. 1.
0.42727273 0.90909091 0.82572282 0.80909091]
mean value: 0.8524979177943448
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9047619 0.9047619 1. 0.95238095 1. 1.
0.71428571 0.95238095 0.9047619 0.9047619 ]
mean value: 0.9238095238095239
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.88888889 1. 0.94736842 1. 1.
0.72727273 0.95238095 0.9 0.90909091]
mean value: 0.9213890787574999
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 1. 1.
0.72727273 1. 1. 0.90909091]
mean value: 0.9636363636363636
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.8 1. 0.9 1. 1.
0.72727273 0.90909091 0.81818182 0.90909091]
mean value: 0.8863636363636364
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9 0.9 1. 0.95 1. 1.
0.71363636 0.95454545 0.90909091 0.90454545]
mean value: 0.9231818181818182
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.8 1. 0.9 1. 1.
0.57142857 0.90909091 0.81818182 0.83333333]
mean value: 0.8632034632034632
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.49
Accuracy on Blind test: 0.73
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.09987569 0.10673809 0.11479974 0.10427046 0.10179067 0.12284112
0.14259434 0.10524893 0.10025811 0.09575725]
mean value: 0.10941743850708008
key: score_time
value: [0.02278996 0.02107644 0.02003169 0.01899815 0.01821494 0.0267005
0.02168226 0.01780367 0.01765037 0.01754332]
mean value: 0.020249128341674805
key: test_mcc
value: [0.82275335 0.71562645 0.90829511 0.61818182 0.82572282 0.52727273
0.23636364 0.80909091 0.71818182 0.90909091]
mean value: 0.7090579546795412
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9047619 0.85714286 0.95238095 0.80952381 0.9047619 0.76190476
0.61904762 0.9047619 0.85714286 0.95238095]
mean value: 0.8523809523809524
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.84210526 0.94736842 0.8 0.90909091 0.76190476
0.63636364 0.90909091 0.85714286 0.95238095]
mean value: 0.8504336599073441
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.88888889 1. 0.8 0.83333333 0.8
0.63636364 0.90909091 0.9 1. ]
mean value: 0.8767676767676768
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.8 0.9 0.8 1. 0.72727273
0.63636364 0.90909091 0.81818182 0.90909091]
mean value: 0.8300000000000001
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9 0.85454545 0.95 0.80909091 0.90909091 0.76363636
0.61818182 0.90454545 0.85909091 0.95454545]
mean value: 0.8522727272727273
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.72727273 0.9 0.66666667 0.83333333 0.61538462
0.46666667 0.83333333 0.75 0.90909091]
mean value: 0.7501748251748251
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.29
Accuracy on Blind test: 0.67
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01057601 0.01102757 0.01102114 0.01060414 0.0107491 0.01077795
0.01005793 0.01006055 0.01025558 0.00970602]
mean value: 0.010483598709106446
key: score_time
value: [0.01038694 0.01018834 0.00997496 0.00985217 0.00996852 0.00999904
0.00966644 0.00945711 0.00876474 0.0086937 ]
mean value: 0.009695196151733398
key: test_mcc
value: [0.43007562 0.58630197 0.80909091 0.44038551 0.52295779 0.35527986
0.13762047 0.26967994 0.52727273 0.67419986]
mean value: 0.47528646505235683
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.71428571 0.76190476 0.9047619 0.71428571 0.76190476 0.66666667
0.57142857 0.61904762 0.76190476 0.80952381]
mean value: 0.7285714285714285
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.66666667 0.9 0.72727273 0.73684211 0.63157895
0.60869565 0.55555556 0.76190476 0.77777778]
mean value: 0.7032960860649647
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 1. 0.9 0.66666667 0.77777778 0.75
0.58333333 0.71428571 0.8 1. ]
mean value: 0.7942063492063492
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6 0.5 0.9 0.8 0.7 0.54545455
0.63636364 0.45454545 0.72727273 0.63636364]
mean value: 0.65
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.70909091 0.75 0.90454545 0.71818182 0.75909091 0.67272727
0.56818182 0.62727273 0.76363636 0.81818182]
mean value: 0.7290909090909091
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.5 0.81818182 0.57142857 0.58333333 0.46153846
0.4375 0.38461538 0.61538462 0.63636364]
mean value: 0.5508345820845821
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.32293153 1.42478061 1.48231292 1.47654915 1.48469901 1.48599815
1.46488094 1.48408747 1.48369265 1.48373985]
mean value: 1.4593672275543212
key: score_time
value: [0.10207772 0.10522914 0.10460138 0.10517383 0.10500145 0.10545468
0.10445786 0.10404372 0.1047914 0.10368395]
mean value: 0.10445151329040528
key: test_mcc
value: [0.66332496 0.53935989 0.90829511 0.71562645 0.90909091 0.90829511
0.33028913 0.90909091 0.90829511 0.90909091]
mean value: 0.7700758470872185
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.80952381 0.76190476 0.95238095 0.85714286 0.95238095 0.95238095
0.66666667 0.95238095 0.95238095 0.95238095]
mean value: 0.8809523809523809
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.75 0.70588235 0.94736842 0.84210526 0.95238095 0.95652174
0.69565217 0.95238095 0.95652174 0.95238095]
mean value: 0.8711194546468473
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.85714286 1. 0.88888889 0.90909091 0.91666667
0.66666667 1. 0.91666667 1. ]
mean value: 0.9155122655122655
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6 0.6 0.9 0.8 1. 1.
0.72727273 0.90909091 1. 0.90909091]
mean value: 0.8445454545454545
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8 0.75454545 0.95 0.85454545 0.95454545 0.95
0.66363636 0.95454545 0.95 0.95454545]
mean value: 0.8786363636363637
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.6 0.54545455 0.9 0.72727273 0.90909091 0.91666667
0.53333333 0.90909091 0.91666667 0.90909091]
mean value: 0.7866666666666666
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.00507212 1.5582056 1.35330391 0.88756967 0.90433073 1.32030916
1.99715137 0.90522766 0.9060688 0.88190556]
mean value: 1.1719144582748413
key: score_time
value: [0.14654708 0.13812089 0.14643788 0.17348289 0.18689179 0.25120091
0.13585401 0.22276163 0.19270754 0.18566346]
mean value: 0.17796680927276612
key: test_mcc
value: [0.74161985 0.52295779 0.82275335 0.53935989 0.71818182 0.90829511
0.13762047 0.82572282 0.71818182 0.71818182]
mean value: 0.6652874733582891
key: train_mcc
value: [0.95767077 0.95788064 0.96830553 0.95767077 0.95788064 0.96830907
0.98947368 0.96830907 0.96830553 0.95789003]
mean value: 0.9651695734907783
key: test_accuracy
value: [0.85714286 0.76190476 0.9047619 0.76190476 0.85714286 0.95238095
0.57142857 0.9047619 0.85714286 0.85714286]
mean value: 0.8285714285714285
key: train_accuracy
value: [0.97883598 0.97883598 0.98412698 0.97883598 0.97883598 0.98412698
0.99470899 0.98412698 0.98412698 0.97883598]
mean value: 0.9825396825396825
key: test_fscore
value: [0.82352941 0.73684211 0.88888889 0.70588235 0.85714286 0.95652174
0.60869565 0.9 0.85714286 0.85714286]
mean value: 0.8191788721590848
key: train_fscore
value: [0.97894737 0.97916667 0.98429319 0.97894737 0.97916667 0.98412698
0.99470899 0.98412698 0.98395722 0.97894737]
mean value: 0.9826388814528069
key: test_precision
value: [1. 0.77777778 1. 0.85714286 0.81818182 0.91666667
0.58333333 1. 0.9 0.9 ]
mean value: 0.8753102453102453
key: train_precision
value: [0.97894737 0.96907216 0.97916667 0.97894737 0.96907216 0.97894737
0.98947368 0.97894737 0.98924731 0.96875 ]
mean value: 0.9780571466286268
key: test_recall
value: [0.7 0.7 0.8 0.6 0.9 1.
0.63636364 0.81818182 0.81818182 0.81818182]
mean value: 0.7790909090909091
key: train_recall
value: [0.97894737 0.98947368 0.98947368 0.97894737 0.98947368 0.9893617
1. 0.9893617 0.9787234 0.9893617 ]
mean value: 0.9873124300111982
key: test_roc_auc
value: [0.85 0.75909091 0.9 0.75454545 0.85909091 0.95
0.56818182 0.90909091 0.85909091 0.85909091]
mean value: 0.8268181818181818
key: train_roc_auc
value: [0.97883539 0.9787794 0.98409854 0.97883539 0.9787794 0.98415454
0.99473684 0.98415454 0.98409854 0.97889138]
mean value: 0.9825363941769317
key: test_jcc
value: [0.7 0.58333333 0.8 0.54545455 0.75 0.91666667
0.4375 0.81818182 0.75 0.75 ]
mean value: 0.7051136363636363
key: train_jcc
value: [0.95876289 0.95918367 0.96907216 0.95876289 0.95918367 0.96875
0.98947368 0.96875 0.96842105 0.95876289]
mean value: 0.9659122908523149
MCC on Blind test: 0.58
Accuracy on Blind test: 0.8
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02238894 0.00916934 0.00952101 0.01010132 0.01035404 0.01048112
0.01021218 0.01014829 0.00954199 0.01029658]
mean value: 0.011221480369567872
key: score_time
value: [0.00930572 0.0087676 0.00974298 0.00951648 0.00947809 0.00952435
0.00950789 0.00956464 0.00935626 0.00951338]
mean value: 0.009427738189697266
key: test_mcc
value: [ 0.23636364 -0.06741999 0.61818182 0.53935989 -0.04545455 0.24771685
0.03739788 0.33636364 0.55161872 0.24771685]
mean value: 0.2701844747191005
key: train_mcc
value: [0.44056694 0.44988241 0.45158615 0.40571724 0.47104939 0.40741573
0.48340519 0.41808615 0.40741573 0.50382186]
mean value: 0.4438946787181437
key: test_accuracy
value: [0.61904762 0.47619048 0.80952381 0.76190476 0.47619048 0.61904762
0.52380952 0.66666667 0.76190476 0.61904762]
mean value: 0.6333333333333333
key: train_accuracy
value: [0.71957672 0.72486772 0.72486772 0.6984127 0.73544974 0.7037037
0.74074074 0.70899471 0.7037037 0.75132275]
mean value: 0.7211640211640211
key: test_fscore
value: [0.6 0.35294118 0.8 0.70588235 0.47619048 0.6
0.58333333 0.66666667 0.73684211 0.6 ]
mean value: 0.6121856110865399
key: train_fscore
value: [0.71038251 0.72340426 0.71428571 0.66666667 0.73404255 0.69892473
0.72625698 0.7027027 0.69892473 0.74033149]
mean value: 0.7115922343145447
key: test_precision
value: [0.6 0.42857143 0.8 0.85714286 0.45454545 0.66666667
0.53846154 0.7 0.875 0.66666667]
mean value: 0.6587054612054611
key: train_precision
value: [0.73863636 0.7311828 0.74712644 0.75 0.74193548 0.70652174
0.76470588 0.71428571 0.70652174 0.77011494]
mean value: 0.7371031097416126
key: test_recall
value: [0.6 0.3 0.8 0.6 0.5 0.54545455
0.63636364 0.63636364 0.63636364 0.54545455]
mean value: 0.58
key: train_recall
value: [0.68421053 0.71578947 0.68421053 0.6 0.72631579 0.69148936
0.69148936 0.69148936 0.69148936 0.71276596]
mean value: 0.6889249720044793
key: test_roc_auc
value: [0.61818182 0.46818182 0.80909091 0.75454545 0.47727273 0.62272727
0.51818182 0.66818182 0.76818182 0.62272727]
mean value: 0.6327272727272727
key: train_roc_auc
value: [0.71976484 0.72491601 0.72508399 0.69893617 0.73549832 0.70363942
0.74048152 0.70890258 0.70363942 0.75111982]
mean value: 0.7211982082866741
key: test_jcc
value: [0.42857143 0.21428571 0.66666667 0.54545455 0.3125 0.42857143
0.41176471 0.5 0.58333333 0.42857143]
mean value: 0.45197192513368983
key: train_jcc
value: [0.55084746 0.56666667 0.55555556 0.5 0.57983193 0.53719008
0.57017544 0.54166667 0.53719008 0.5877193 ]
mean value: 0.5526843181420478
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'Z...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [2.61034989 5.89738679 5.05655122 4.04000521 4.81949782 4.60966992
4.59260941 4.32805228 4.22248673 2.80192494]
mean value: 4.297853422164917
key: score_time
value: [0.02688909 0.02042317 0.02070355 0.01861191 0.02235937 0.0204308
0.02161312 0.02076817 0.02501345 0.0135026 ]
mean value: 0.021031522750854494
key: test_mcc
value: [0.82275335 0.82275335 0.82275335 1. 1. 0.90829511
0.62641448 1. 1. 0.80909091]
mean value: 0.881206055224815
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9047619 0.9047619 0.9047619 1. 1. 0.95238095
0.80952381 1. 1. 0.9047619 ]
mean value: 0.9380952380952381
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.88888889 0.88888889 1. 1. 0.95652174
0.83333333 1. 1. 0.90909091]
mean value: 0.9365612648221344
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 1. 0.91666667
0.76923077 1. 1. 0.90909091]
mean value: 0.9594988344988344
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.8 0.8 1. 1. 1.
0.90909091 1. 1. 0.90909091]
mean value: 0.9218181818181819
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9 0.9 0.9 1. 1. 0.95
0.80454545 1. 1. 0.90454545]
mean value: 0.9359090909090909
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.8 0.8 1. 1. 0.91666667
0.71428571 1. 1. 0.83333333]
mean value: 0.8864285714285715
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.06827331 0.07484531 0.05876684 0.03986669 0.06439209 0.07714224
0.06403208 0.0616889 0.03780341 0.04415965]
mean value: 0.0590970516204834
key: score_time
value: [0.02553988 0.02265811 0.02189136 0.01328444 0.01226473 0.01283622
0.02830124 0.02277637 0.01287508 0.0122354 ]
mean value: 0.01846628189086914
key: test_mcc
value: [0.66332496 0.45226702 0.43007562 0.63305416 0.44038551 0.71818182
0.60302269 0.71562645 0.71818182 0.67419986]
mean value: 0.6048319896654357
key: train_mcc
value: [0.98947251 0.93650616 0.97883539 0.98947368 0.97883539 0.95767077
0.94755736 0.93650616 0.94755736 0.96873621]
mean value: 0.9631150997764043
key: test_accuracy
value: [0.80952381 0.71428571 0.71428571 0.80952381 0.71428571 0.85714286
0.76190476 0.85714286 0.85714286 0.80952381]
mean value: 0.7904761904761904
key: train_accuracy
value: [0.99470899 0.96825397 0.98941799 0.99470899 0.98941799 0.97883598
0.97354497 0.96825397 0.97354497 0.98412698]
mean value: 0.9814814814814814
key: test_fscore
value: [0.75 0.625 0.66666667 0.81818182 0.72727273 0.85714286
0.70588235 0.86956522 0.85714286 0.77777778]
mean value: 0.7654632274517185
key: train_fscore
value: [0.9947644 0.96842105 0.98947368 0.99470899 0.98947368 0.9787234
0.97297297 0.96808511 0.97297297 0.98378378]
mean value: 0.9813380054035413
key: test_precision
value: [1. 0.83333333 0.75 0.75 0.66666667 0.9
1. 0.83333333 0.9 1. ]
mean value: 0.8633333333333333
key: train_precision
value: [0.98958333 0.96842105 0.98947368 1. 0.98947368 0.9787234
0.98901099 0.96808511 0.98901099 1. ]
mean value: 0.9861782243046241
key: test_recall
value: [0.6 0.5 0.6 0.9 0.8 0.81818182
0.54545455 0.90909091 0.81818182 0.63636364]
mean value: 0.7127272727272728
key: train_recall
value: [1. 0.96842105 0.98947368 0.98947368 0.98947368 0.9787234
0.95744681 0.96808511 0.95744681 0.96808511]
mean value: 0.9766629339305711
key: test_roc_auc
value: [0.8 0.70454545 0.70909091 0.81363636 0.71818182 0.85909091
0.77272727 0.85454545 0.85909091 0.81818182]
mean value: 0.7909090909090909
key: train_roc_auc
value: [0.99468085 0.96825308 0.98941769 0.99473684 0.98941769 0.97883539
0.97346025 0.96825308 0.97346025 0.98404255]
mean value: 0.9814557670772677
key: test_jcc
value: [0.6 0.45454545 0.5 0.69230769 0.57142857 0.75
0.54545455 0.76923077 0.75 0.63636364]
mean value: 0.6269330669330669
key: train_jcc
value: [0.98958333 0.93877551 0.97916667 0.98947368 0.97916667 0.95833333
0.94736842 0.93814433 0.94736842 0.96808511]
mean value: 0.9635465472799757
MCC on Blind test: 0.11
Accuracy on Blind test: 0.53
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02180147 0.01045227 0.01006985 0.0100987 0.00979686 0.00981426
0.00987458 0.01014495 0.01003695 0.01014376]
mean value: 0.011223363876342773
key: score_time
value: [0.00946331 0.00991654 0.00950646 0.00958514 0.00926518 0.00935149
0.00945377 0.00949931 0.00967503 0.00960541]
mean value: 0.00953216552734375
key: test_mcc
value: [0.14545455 0.13762047 0.63305416 0.33028913 0.30914104 0.45226702
0.13762047 0.61818182 0.71818182 0.23373675]
mean value: 0.3715547220669732
key: train_mcc
value: [0.41147388 0.40281841 0.38806379 0.41147388 0.39005594 0.40105488
0.40396007 0.42240682 0.43243527 0.41239882]
mean value: 0.40761417768557445
key: test_accuracy
value: [0.57142857 0.57142857 0.80952381 0.66666667 0.61904762 0.71428571
0.57142857 0.80952381 0.85714286 0.61904762]
mean value: 0.680952380952381
key: train_accuracy
value: [0.7037037 0.6984127 0.69312169 0.7037037 0.69312169 0.6984127
0.6984127 0.70899471 0.71428571 0.7037037 ]
mean value: 0.7015873015873015
key: test_fscore
value: [0.57142857 0.52631579 0.81818182 0.63157895 0.69230769 0.76923077
0.60869565 0.81818182 0.85714286 0.66666667]
mean value: 0.6959730582156212
key: train_fscore
value: [0.7254902 0.72463768 0.71 0.7254902 0.71568627 0.71641791
0.72195122 0.72636816 0.73 0.72277228]
mean value: 0.7218813914217747
key: test_precision
value: [0.54545455 0.55555556 0.75 0.66666667 0.5625 0.66666667
0.58333333 0.81818182 0.9 0.61538462]
mean value: 0.6663743201243202
key: train_precision
value: [0.67889908 0.66964286 0.67619048 0.67889908 0.66972477 0.6728972
0.66666667 0.68224299 0.68867925 0.67592593]
mean value: 0.6759768293904649
key: test_recall
value: [0.6 0.5 0.9 0.6 0.9 0.90909091
0.63636364 0.81818182 0.81818182 0.72727273]
mean value: 0.740909090909091
key: train_recall
value: [0.77894737 0.78947368 0.74736842 0.77894737 0.76842105 0.76595745
0.78723404 0.77659574 0.77659574 0.77659574]
mean value: 0.7746136618141097
key: test_roc_auc
value: [0.57272727 0.56818182 0.81363636 0.66363636 0.63181818 0.70454545
0.56818182 0.80909091 0.85909091 0.61363636]
mean value: 0.6804545454545454
key: train_roc_auc
value: [0.70330347 0.69792833 0.69283315 0.70330347 0.69272116 0.6987682
0.69888018 0.7093505 0.71461366 0.70408735]
mean value: 0.7015789473684211
key: test_jcc
value: [0.4 0.35714286 0.69230769 0.46153846 0.52941176 0.625
0.4375 0.69230769 0.75 0.5 ]
mean value: 0.5445208468002586
key: train_jcc
value: [0.56923077 0.56818182 0.5503876 0.56923077 0.55725191 0.55813953
0.5648855 0.5703125 0.57480315 0.56589147]
mean value: 0.5648315015480971
MCC on Blind test: 0.12
Accuracy on Blind test: 0.6
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01307464 0.01691055 0.01696324 0.01449084 0.018785 0.01570868
0.01574159 0.01785326 0.01694202 0.01764059]
mean value: 0.01641104221343994
key: score_time
value: [0.0097959 0.0115664 0.0114677 0.01176572 0.01179075 0.01177979
0.01176739 0.01171255 0.01223779 0.01183796]
mean value: 0.01157219409942627
key: test_mcc
value: [0.43007562 0.46249729 0.71562645 0.80909091 0.36244122 0.66332496
0.13762047 0.67419986 0.80909091 0.60302269]
mean value: 0.5666990369115905
key: train_mcc
value: [0.88757469 0.63581076 0.96830553 0.76193012 0.76291765 0.67012598
0.88607273 0.71085804 0.94755736 0.91860433]
mean value: 0.8149757191296667
key: test_accuracy
value: [0.71428571 0.66666667 0.85714286 0.9047619 0.66666667 0.80952381
0.57142857 0.80952381 0.9047619 0.76190476]
mean value: 0.7666666666666666
key: train_accuracy
value: [0.94179894 0.78835979 0.98412698 0.87830688 0.86772487 0.80952381
0.94179894 0.83597884 0.97354497 0.95767196]
mean value: 0.8978835978835978
key: test_fscore
value: [0.66666667 0.74074074 0.84210526 0.9 0.53333333 0.84615385
0.60869565 0.77777778 0.90909091 0.70588235]
mean value: 0.7530446542036258
key: train_fscore
value: [0.94472362 0.82608696 0.98429319 0.87150838 0.84848485 0.83928571
0.94358974 0.80254777 0.97297297 0.95555556]
mean value: 0.898904875380721
key: test_precision
value: [0.75 0.58823529 0.88888889 0.9 0.8 0.73333333
0.58333333 1. 0.90909091 1. ]
mean value: 0.8152881758764112
key: train_precision
value: [0.90384615 0.7037037 0.97916667 0.92857143 1. 0.72307692
0.91089109 1. 0.98901099 1. ]
mean value: 0.9138266953984776
key: test_recall
value: [0.6 1. 0.8 0.9 0.4 1.
0.63636364 0.63636364 0.90909091 0.54545455]
mean value: 0.7427272727272727
key: train_recall
value: [0.98947368 1. 0.98947368 0.82105263 0.73684211 1.
0.9787234 0.67021277 0.95744681 0.91489362]
mean value: 0.9058118701007839
key: test_roc_auc
value: [0.70909091 0.68181818 0.85454545 0.90454545 0.65454545 0.8
0.56818182 0.81818182 0.90454545 0.77272727]
mean value: 0.7668181818181818
key: train_roc_auc
value: [0.94154535 0.78723404 0.98409854 0.87861142 0.86842105 0.81052632
0.94199328 0.83510638 0.97346025 0.95744681]
mean value: 0.8978443449048152
key: test_jcc
value: [0.5 0.58823529 0.72727273 0.81818182 0.36363636 0.73333333
0.4375 0.63636364 0.83333333 0.54545455]
mean value: 0.6183311051693404
key: train_jcc
value: [0.8952381 0.7037037 0.96907216 0.77227723 0.73684211 0.72307692
0.89320388 0.67021277 0.94736842 0.91489362]
mean value: 0.8225888907479606
MCC on Blind test: 0.61
Accuracy on Blind test: 0.8
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01661968 0.01596975 0.01542592 0.01607823 0.01489186 0.01536465
0.01502705 0.01687884 0.03487134 0.02883863]
mean value: 0.018996596336364746
key: score_time
value: [0.01196527 0.01202655 0.01178312 0.01207161 0.01201534 0.01186323
0.01198721 0.01221204 0.02217579 0.02880287]
mean value: 0.014690303802490234
key: test_mcc
value: [0.53935989 0. 0.74161985 0.90909091 0.53300179 0.50874702
0.03739788 0.90909091 0.50874702 0.26593594]
mean value: 0.49529912072758353
key: train_mcc
value: [0.94713854 0.47421554 0.89436546 0.87787601 0.51260702 0.64546146
0.84944554 0.83355494 0.38837405 0.37937244]
mean value: 0.6802411002611571
key: test_accuracy
value: [0.76190476 0.52380952 0.85714286 0.95238095 0.71428571 0.71428571
0.52380952 0.95238095 0.71428571 0.61904762]
mean value: 0.7333333333333333
key: train_accuracy
value: [0.97354497 0.68253968 0.94708995 0.93650794 0.70899471 0.79365079
0.92063492 0.91005291 0.62962963 0.62433862]
mean value: 0.8126984126984127
key: test_fscore
value: [0.70588235 0. 0.82352941 0.95238095 0.76923077 0.78571429
0.58333333 0.95238095 0.78571429 0.71428571]
mean value: 0.7072452057746176
key: train_fscore
value: [0.97382199 0.53846154 0.94791667 0.94 0.7755102 0.82819383
0.92537313 0.9005848 0.72868217 0.72586873]
mean value: 0.8284413057399109
key: test_precision
value: [0.85714286 0. 1. 0.90909091 0.625 0.64705882
0.53846154 1. 0.64705882 0.58823529]
mean value: 0.6812048245871776
key: train_precision
value: [0.96875 1. 0.93814433 0.8952381 0.63333333 0.70676692
0.86915888 1. 0.57317073 0.56969697]
mean value: 0.8154259255670528
key: test_recall
value: [0.6 0. 0.7 1. 1. 1.
0.63636364 0.90909091 1. 0.90909091]
mean value: 0.7754545454545454
key: train_recall
value: [0.97894737 0.36842105 0.95789474 0.98947368 1. 1.
0.9893617 0.81914894 1. 1. ]
mean value: 0.9103247480403136
key: test_roc_auc
value: [0.75454545 0.5 0.85 0.95454545 0.72727273 0.7
0.51818182 0.95454545 0.7 0.60454545]
mean value: 0.7263636363636363
key: train_roc_auc
value: [0.97351624 0.68421053 0.94703247 0.9362262 0.70744681 0.79473684
0.92099664 0.90957447 0.63157895 0.62631579]
mean value: 0.8131634938409854
key: test_jcc
value: [0.54545455 0. 0.7 0.90909091 0.625 0.64705882
0.41176471 0.90909091 0.64705882 0.55555556]
mean value: 0.5950074272133096
key: train_jcc
value: [0.94897959 0.36842105 0.9009901 0.88679245 0.63333333 0.70676692
0.86111111 0.81914894 0.57317073 0.56969697]
mean value: 0.726841119562058
MCC on Blind test: 0.67
Accuracy on Blind test: 0.8
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.15451217 0.15445352 0.2040379 0.15504122 0.15569568 0.15498495
0.15341592 0.16030359 0.15658045 0.15535378]
mean value: 0.1604379177093506
key: score_time
value: [0.021106 0.02071619 0.02216125 0.0211637 0.02109528 0.021137
0.02115512 0.02139163 0.02109694 0.02112269]
mean value: 0.02121458053588867
key: test_mcc
value: [0.90829511 0.90829511 0.90829511 0.82275335 0.90829511 0.71562645
0.52295779 1. 0.90829511 0.90909091]
mean value: 0.8511904027211744
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95238095 0.95238095 0.95238095 0.9047619 0.95238095 0.85714286
0.76190476 1. 0.95238095 0.95238095]
mean value: 0.9238095238095237
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94736842 0.94736842 0.94736842 0.88888889 0.94736842 0.86956522
0.7826087 1. 0.95652174 0.95238095]
mean value: 0.9239439177654281
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 1. 0.83333333
0.75 1. 0.91666667 1. ]
mean value: 0.95
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9 0.9 0.9 0.8 0.9 0.90909091
0.81818182 1. 1. 0.90909091]
mean value: 0.9036363636363637
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95 0.95 0.95 0.9 0.95 0.85454545
0.75909091 1. 0.95 0.95454545]
mean value: 0.9218181818181819
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9 0.9 0.9 0.8 0.9 0.76923077
0.64285714 1. 0.91666667 0.90909091]
mean value: 0.8637845487845488
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.87
Accuracy on Blind test: 0.93
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.0783937 0.06188798 0.0702672 0.065274 0.08460093 0.07100463
0.07129097 0.0830369 0.0656898 0.09316111]
mean value: 0.07446072101593018
key: score_time
value: [0.02479196 0.02323222 0.02332997 0.02291703 0.02248645 0.02907228
0.0261395 0.02889991 0.02425504 0.03242946]
mean value: 0.02575538158416748
key: test_mcc
value: [0.82275335 0.82275335 1. 0.90829511 0.90829511 0.80909091
0.62641448 1. 1. 0.80909091]
mean value: 0.8706693216360863
key: train_mcc
value: [0.98947368 1. 0.98947368 0.98947368 0.97905701 1.
1. 0.98947251 0.97905701 0.96830907]
mean value: 0.9884316657513018
key: test_accuracy
value: [0.9047619 0.9047619 1. 0.95238095 0.95238095 0.9047619
0.80952381 1. 1. 0.9047619 ]
mean value: 0.9333333333333333
key: train_accuracy
value: [0.99470899 1. 0.99470899 0.99470899 0.98941799 1.
1. 0.99470899 0.98941799 0.98412698]
mean value: 0.9941798941798942
key: test_fscore
value: [0.88888889 0.88888889 1. 0.94736842 0.94736842 0.90909091
0.83333333 1. 1. 0.90909091]
mean value: 0.9324029771398192
key: train_fscore
value: [0.99470899 1. 0.99470899 0.99470899 0.9893617 1.
1. 0.99465241 0.98947368 0.98412698]
mean value: 0.9941741761009266
key: test_precision
value: [1. 1. 1. 1. 1. 0.90909091
0.76923077 1. 1. 0.90909091]
mean value: 0.9587412587412587
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.97916667 0.97894737]
mean value: 0.995811403508772
key: test_recall
value: [0.8 0.8 1. 0.9 0.9 0.90909091
0.90909091 1. 1. 0.90909091]
mean value: 0.9127272727272727
key: train_recall
value: [0.98947368 1. 0.98947368 0.98947368 0.97894737 1.
1. 0.9893617 1. 0.9893617 ]
mean value: 0.9926091825307951
key: test_roc_auc
value: [0.9 0.9 1. 0.95 0.95 0.90454545
0.80454545 1. 1. 0.90454545]
mean value: 0.9313636363636364
key: train_roc_auc
value: [0.99473684 1. 0.99473684 0.99473684 0.98947368 1.
1. 0.99468085 0.98947368 0.98415454]
mean value: 0.9941993281075028
key: test_jcc
value: [0.8 0.8 1. 0.9 0.9 0.83333333
0.71428571 1. 1. 0.83333333]
mean value: 0.8780952380952382
key: train_jcc
value: [0.98947368 1. 0.98947368 0.98947368 0.97894737 1.
1. 0.9893617 0.97916667 0.96875 ]
mean value: 0.9884646789846958
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.09878612 0.09960461 0.09258199 0.09417605 0.09641194 0.08949471
0.09223723 0.11594605 0.1179924 0.12251711]
mean value: 0.10197482109069825
key: score_time
value: [0.03881836 0.03061247 0.03737235 0.03541827 0.03377557 0.02945137
0.03179479 0.0367341 0.0444777 0.0131073 ]
mean value: 0.03315622806549072
key: test_mcc
value: [ 0.23373675 0.62641448 0.74161985 0.33636364 0.63305416 0.42727273
-0.03739788 0.82572282 0.4719399 0.67419986]
mean value: 0.49329263185333116
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61904762 0.80952381 0.85714286 0.66666667 0.80952381 0.71428571
0.47619048 0.9047619 0.71428571 0.80952381]
mean value: 0.7380952380952381
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.55555556 0.77777778 0.82352941 0.66666667 0.81818182 0.72727273
0.42105263 0.9 0.66666667 0.77777778]
mean value: 0.7134481033242643
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.625 0.875 1. 0.63636364 0.75 0.72727273
0.5 1. 0.85714286 1. ]
mean value: 0.797077922077922
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.5 0.7 0.7 0.7 0.9 0.72727273
0.36363636 0.81818182 0.54545455 0.63636364]
mean value: 0.6590909090909091
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.61363636 0.80454545 0.85 0.66818182 0.81363636 0.71363636
0.48181818 0.90909091 0.72272727 0.81818182]
mean value: 0.7395454545454545
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.38461538 0.63636364 0.7 0.5 0.69230769 0.57142857
0.26666667 0.81818182 0.5 0.63636364]
mean value: 0.5705927405927406
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.17
Accuracy on Blind test: 0.6
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.4725976 0.47249556 0.44347739 0.43946433 0.55229735 0.44706702
0.44039941 0.434376 0.51444864 0.48032069]
mean value: 0.4696943998336792
key: score_time
value: [0.01501417 0.01269388 0.0131402 0.0126729 0.01365566 0.01293302
0.01276636 0.01274252 0.01311779 0.0128448 ]
mean value: 0.013158130645751952
key: test_mcc
value: [0.82275335 0.90829511 1. 0.90829511 1. 1.
0.62641448 1. 1. 1. ]
mean value: 0.9265758046971604
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9047619 0.95238095 1. 0.95238095 1. 1.
0.80952381 1. 1. 1. ]
mean value: 0.9619047619047619
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.94736842 1. 0.94736842 1. 1.
0.83333333 1. 1. 1. ]
mean value: 0.9616959064327486
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 1. 1. 1. 1.
0.76923077 1. 1. 1. ]
mean value: 0.9769230769230769
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.9 1. 0.9 1. 1.
0.90909091 1. 1. 1. ]
mean value: 0.9509090909090909
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9 0.95 1. 0.95 1. 1.
0.80454545 1. 1. 1. ]
mean value: 0.9604545454545454
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.9 1. 0.9 1. 1.
0.71428571 1. 1. 1. ]
mean value: 0.9314285714285715
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03971577 0.05372071 0.04902482 0.0534029 0.05441594 0.0602479
0.05954766 0.05021262 0.05034852 0.06813431]
mean value: 0.05387711524963379
key: score_time
value: [0.02256966 0.02043724 0.01937819 0.01983213 0.01934934 0.01934004
0.01987314 0.02205038 0.07356358 0.0410378 ]
mean value: 0.027743148803710937
key: test_mcc
value: [0.60302269 0.82572282 0.67419986 0.60302269 0.60302269 0.66332496
0.50874702 0.66332496 0.74161985 0.82275335]
mean value: 0.6708760888902932
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.76190476 0.9047619 0.80952381 0.76190476 0.76190476 0.80952381
0.71428571 0.80952381 0.85714286 0.9047619 ]
mean value: 0.8095238095238095
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 0.90909091 0.83333333 0.8 0.8 0.84615385
0.78571429 0.84615385 0.88 0.91666667]
mean value: 0.8417112887112888
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.83333333 0.71428571 0.66666667 0.66666667 0.73333333
0.64705882 0.73333333 0.78571429 0.84615385]
mean value: 0.7293212669683258
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.77272727 0.90909091 0.81818182 0.77272727 0.77272727 0.8
0.7 0.8 0.85 0.9 ]
mean value: 0.8095454545454546
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 0.83333333 0.71428571 0.66666667 0.66666667 0.73333333
0.64705882 0.73333333 0.78571429 0.84615385]
mean value: 0.7293212669683258
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.0
Accuracy on Blind test: 0.6
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.03402305 0.05161929 0.07370734 0.047405 0.04301023 0.04708791
0.04369712 0.04251552 0.04322314 0.04266644]
mean value: 0.046895503997802734
key: score_time
value: [0.03400087 0.05133557 0.03652644 0.0336163 0.03723359 0.03171778
0.0283761 0.03356314 0.02800155 0.03366947]
mean value: 0.03480408191680908
key: test_mcc
value: [0.74161985 0.62641448 0.74161985 0.90909091 0.71818182 0.71818182
0.23636364 1. 0.80909091 0.82572282]
mean value: 0.732628609547866
key: train_mcc
value: [0.95767077 0.92597156 0.96830907 0.94714446 0.96830553 0.93672304
0.95767077 0.95767077 0.95788064 0.96830907]
mean value: 0.9545655686835185
key: test_accuracy
value: [0.85714286 0.80952381 0.85714286 0.95238095 0.85714286 0.85714286
0.61904762 1. 0.9047619 0.9047619 ]
mean value: 0.8619047619047618
key: train_accuracy
value: [0.97883598 0.96296296 0.98412698 0.97354497 0.98412698 0.96825397
0.97883598 0.97883598 0.97883598 0.98412698]
mean value: 0.9772486772486773
key: test_fscore
value: [0.82352941 0.77777778 0.82352941 0.95238095 0.85714286 0.85714286
0.63636364 1. 0.90909091 0.9 ]
mean value: 0.8536957813428402
key: train_fscore
value: [0.97894737 0.96335079 0.98412698 0.97354497 0.98429319 0.96842105
0.9787234 0.9787234 0.97849462 0.98412698]
mean value: 0.9772752774075717
key: test_precision
value: [1. 0.875 1. 0.90909091 0.81818182 0.9
0.63636364 1. 0.90909091 1. ]
mean value: 0.9047727272727273
key: train_precision
value: [0.97894737 0.95833333 0.9893617 0.9787234 0.97916667 0.95833333
0.9787234 0.9787234 0.98913043 0.97894737]
mean value: 0.9768390419851665
key: test_recall
value: [0.7 0.7 0.7 1. 0.9 0.81818182
0.63636364 1. 0.90909091 0.81818182]
mean value: 0.8181818181818181
key: train_recall
value: [0.97894737 0.96842105 0.97894737 0.96842105 0.98947368 0.9787234
0.9787234 0.9787234 0.96808511 0.9893617 ]
mean value:/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_sl.py:188: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_sl.py:191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
0.9777827547592385
key: test_roc_auc
value: [0.85 0.80454545 0.85 0.95454545 0.85909091 0.85909091
0.61818182 1. 0.90454545 0.90909091]
mean value: 0.8609090909090908
key: train_roc_auc
value: [0.97883539 0.96293393 0.98415454 0.97357223 0.98409854 0.96830907
0.97883539 0.97883539 0.9787794 0.98415454]
mean value: 0.9772508398656214
key: test_jcc
value: [0.7 0.63636364 0.7 0.90909091 0.75 0.75
0.46666667 1. 0.83333333 0.81818182]
mean value: 0.7563636363636363
key: train_jcc
value: [0.95876289 0.92929293 0.96875 0.94845361 0.96907216 0.93877551
0.95833333 0.95833333 0.95789474 0.96875 ]
mean value: 0.9556418502799597
MCC on Blind test: 0.72
Accuracy on Blind test: 0.87
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=166)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.38085413 0.38319826 0.41932321 0.38498354 0.38405871 0.38967252
0.35881495 0.39832497 0.41211724 0.37258697]
mean value: 0.3883934497833252
key: score_time
value: [0.03946233 0.02718139 0.03768969 0.03861022 0.03152013 0.03818846
0.03179121 0.0374887 0.02729797 0.03727818]
mean value: 0.0346508264541626
key: test_mcc
value: [0.74161985 0.74161985 0.66332496 0.90909091 0.71818182 0.80909091
0.23636364 1. 0.90829511 0.74795759]
mean value: 0.7475544626453499
key: train_mcc
value: [0.95767077 0.95767077 0.96830907 0.94714446 0.94714446 0.95767077
0.95767077 0.95767077 0.96830553 0.95767077]
mean value: 0.9576928147788344
key: test_accuracy
value: [0.85714286 0.85714286 0.80952381 0.95238095 0.85714286 0.9047619
0.61904762 1. 0.95238095 0.85714286]
mean value: 0.8666666666666667
key: train_accuracy
value: [0.97883598 0.97883598 0.98412698 0.97354497 0.97354497 0.97883598
0.97883598 0.97883598 0.98412698 0.97883598]
mean value: 0.9788359788359788
key: test_fscore
value: [0.82352941 0.82352941 0.75 0.95238095 0.85714286 0.90909091
0.63636364 1. 0.95652174 0.84210526]
mean value: 0.8550664180796096
key: train_fscore
value: [0.97894737 0.97894737 0.98412698 0.97354497 0.97354497 0.9787234
0.9787234 0.9787234 0.98395722 0.9787234 ]
mean value: 0.978796250433165
key: test_precision
value: [1. 1. 1. 0.90909091 0.81818182 0.90909091
0.63636364 1. 0.91666667 1. ]
mean value: 0.918939393939394
key: train_precision
value: [0.97894737 0.97894737 0.9893617 0.9787234 0.9787234 0.9787234
0.9787234 0.9787234 0.98924731 0.9787234 ]
mean value: 0.9808844176329636
key: test_recall
value: [0.7 0.7 0.6 1. 0.9 0.90909091
0.63636364 1. 1. 0.72727273]
mean value: 0.8172727272727273
key: train_recall
value: [0.97894737 0.97894737 0.97894737 0.96842105 0.96842105 0.9787234
0.9787234 0.9787234 0.9787234 0.9787234 ]
mean value: 0.9767301231802912
key: test_roc_auc
value: [0.85 0.85 0.8 0.95454545 0.85909091 0.90454545
0.61818182 1. 0.95 0.86363636]
mean value: 0.865
key: train_roc_auc
value: [0.97883539 0.97883539 0.98415454 0.97357223 0.97357223 0.97883539
0.97883539 0.97883539 0.98409854 0.97883539]
mean value: 0.9788409854423291
key: test_jcc
value: [0.7 0.7 0.6 0.90909091 0.75 0.83333333
0.46666667 1. 0.91666667 0.72727273]
mean value: 0.7603030303030303
key: train_jcc
value: [0.95876289 0.95876289 0.96875 0.94845361 0.94845361 0.95833333
0.95833333 0.95833333 0.96842105 0.95833333]
mean value: 0.9584937375655634
MCC on Blind test: 0.6
Accuracy on Blind test: 0.8