/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py:550: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 271 PASS: my_features_df and aa_df successfully combined nrows: 271 ncols: 269 count of NULL values before imputation or_mychisq 256 log10_or_mychisq 256 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML No. of numerical features: 45 No. of categorical features: 7 index: 0 ind: 1 Mask count check: True index: 1 ind: 2 Mask count check: True Original Data Counter({0: 7, 1: 1}) Data dim: (8, 52) ------------------------------------------------------------- Successfully split data: UQ [no aa_index but active site included] training actual values: training set imputed values: blind test set Train data size: (8, 52) Test data size: (263, 52) y_train numbers: Counter({0: 7, 1: 1}) y_train ratio: 7.0 y_test_numbers: Counter({0: 262, 1: 1}) y_test ratio: 262.0 ------------------------------------------------------------- Simple Random OverSampling Counter({0: 7, 1: 7}) (14, 52) Simple Random UnderSampling Counter({0: 1, 1: 1}) (2, 52) Simple Combined Over and UnderSampling Counter({0: 7, 1: 7}) (14, 52) Traceback (most recent call last): File "/home/tanu/git/LSHTM_analysis/scripts/ml/./alr_config.py", line 26, in setvars(gene,drug) File "/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py", line 701, in setvars X_smnc, y_smnc = sm_nc.fit_resample(X, y) File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/imblearn/base.py", line 83, in fit_resample output = self._fit_resample(X, y) File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/imblearn/over_sampling/_smote/base.py", line 533, in _fit_resample X_resampled, y_resampled = super()._fit_resample(X_encoded, y) File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/imblearn/over_sampling/_smote/base.py", line 324, in _fit_resample nns = self.nn_k_.kneighbors(X_class, return_distance=False)[:, 1:] File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neighbors/_base.py", line 749, in kneighbors raise ValueError( ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6