How to use BaseCrossValidator object

See original GitHub issue

Hello. I’m trying to modify the cross validation example https://automl.github.io/auto-sklearn/master/examples/example_crossvalidation.html#sphx-glr-examples-example-crossvalidation-py , to use BaseCrossValidator object as a resampling_strategy argument, for example, LeaveOneOut, but I just can’t figure out how to do it.

import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
from sklearn.model_selection import LeaveOneOut
import autosklearn.classification
tmp_folder = '/mnt/e/autosklearn_parallel_example_tmp'
output_folder = '/mnt/e/autosklearn_parallel_example_out'

def main():
    X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
    X_train, X_test, y_train, y_test = \
        sklearn.model_selection.train_test_split(X, y, random_state=1)

    automl = autosklearn.classification.AutoSklearnClassifier(
        time_left_for_this_task=120,
        per_run_time_limit=30,
        tmp_folder=tmp_folder,
        output_folder=output_folder,
        delete_tmp_folder_after_terminate=False,
        #first try
#        resampling_strategy='LeaveOneOut',
#        resampling_strategy_arguments={},
        #second try
#        resampling_strategy='TrainEvaluator',
#        resampling_strategy_arguments={'LeaveOneOut': {}},
        #third try
#        resampling_strategy='BaseCrossValidator',
#        resampling_strategy_arguments={'LeaveOneOut': {}},
        #fourth try
        resampling_strategy=LeaveOneOut(),
        resampling_strategy_arguments={},
    )

    # fit() changes the data in place, but refit needs the original data. We
    # therefore copy the data. In practice, one should reload the data
    automl.fit(X_train.copy(), y_train.copy(), dataset_name='breast_cancer')
    # During fit(), models are fit on individual cross-validation folds. To use
    # all available data, we call refit() which trains all models in the
    # final ensemble on the whole dataset.
    automl.refit(X_train.copy(), y_train.copy())

    print(automl.show_models())

    predictions = automl.predict(X_test)
    print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))


if __name__ == '__main__':
    main()

every time I get an error:

Traceback (most recent call last): File “test_420_ASKL.py”, line 52, in <module> main() File “test_420_ASKL.py”, line 39, in main automl.fit(X_train.copy(), y_train.copy(), dataset_name=‘breast_cancer’) File “/tmp/yes/envs/AutoSLK_42/lib/python3.6/site-packages/autosklearn/estimators.py”, line 500, in fit dataset_name=dataset_name, File “/tmp/yes/envs/AutoSLK_42/lib/python3.6/site-packages/autosklearn/estimators.py”, line 267, in fit self._automl.fit(*args, **kwargs) File “/tmp/yes/envs/AutoSLK_42/lib/python3.6/site-packages/autosklearn/automl.py”, line 965, in fit only_return_configuration_space=only_return_configuration_space, File “/tmp/yes/envs/AutoSLK_42/lib/python3.6/site-packages/autosklearn/automl.py”, line 203, in fit only_return_configuration_space, File “/tmp/yes/envs/AutoSLK_42/lib/python3.6/site-packages/autosklearn/automl.py”, line 322, in _fit and not issubclass(self._resampling_strategy, BaseCrossValidator)
File “/tmp/yes/envs/AutoSLK_42/lib/python3.6/abc.py”, line 228, in subclasscheck if issubclass(subclass, scls): File “/tmp/yes/envs/AutoSLK_42/lib/python3.6/abc.py”, line 232, in subclasscheck cls._abc_negative_cache.add(subclass) File “/tmp/yes/envs/AutoSLK_42/lib/python3.6/_weakrefset.py”, line 84, in add self.data.add(ref(item, self._remove)) TypeError: cannot create weak reference to ‘str’ object

on the fourth try I get an error:

Traceback (most recent call last): File “test_420_ASKL.py”, line 55, in <module> main() File “test_420_ASKL.py”, line 42, in main automl.fit(X_train.copy(), y_train.copy(), dataset_name=‘breast_cancer’) File “/tmp/yes/envs/AutoSLK_42/lib/python3.6/site-packages/autosklearn/estimators.py”, line 500, in fit dataset_name=dataset_name, File “/tmp/yes/envs/AutoSLK_42/lib/python3.6/site-packages/autosklearn/estimators.py”, line 267, in fit self._automl.fit(*args, **kwargs) File “/tmp/yes/envs/AutoSLK_42/lib/python3.6/site-packages/autosklearn/automl.py”, line 965, in fit only_return_configuration_space=only_return_configuration_space, File “/tmp/yes/envs/AutoSLK_42/lib/python3.6/site-packages/autosklearn/automl.py”, line 203, in fit only_return_configuration_space, File “/tmp/yes/envs/AutoSLK_42/lib/python3.6/site-packages/autosklearn/automl.py”, line 326, in _fit self._resampling_strategy) ValueError: Illegal resampling strategy: LeaveOneOut()

How to do it right?

Issue Analytics

State:
Created 5 years ago
Comments:5 (1 by maintainers)

Top GitHub Comments

2reactions

khenrixcommented, Mar 26, 2019

Hey, I had the same problem. I solved it by feeding all the arguments for the Cross-validator object.

Example: automl = autosklearn.classification.AutoSklearnClassifier( time_left_for_this_task=120, per_run_time_limit=30, tmp_folder='/tmp/autosklearn_cv_example_tmp', output_folder='/tmp/autosklearn_cv_example_out', delete_tmp_folder_after_terminate=False, resampling_strategy=KFold, resampling_strategy_arguments={'n_splits': 5, 'shuffle': False, 'random_state': None}, )

As stated in the API Docs.

BaseCrossValidator or _RepeatedSplits or BaseShuffleSplit object: all arguments required by chosen class as specified in scikit-learn documentation. If arguments are not provided, scikit-learn defaults are used. If no defaults are available, an exception is raised. Refer to the ‘n_splits’ argument as ‘folds’.

1reaction

khenrixcommented, Jun 10, 2020

Maybe someone can look into it a bit further and find the actual cause. You could try this and work from there. Gave me a terrible score, but compiled 🤷‍♂️ Best of luck!

import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
from sklearn.model_selection import LeaveOneOut as LOO
import autosklearn.regression


class LeaveOneOut(LOO):

    def __init__(self, X):
        self.X = X

    def get_n_splits(self, X=None, y=None, groups=None):
        return super().get_n_splits(self.X)


def main():
    X, y = sklearn.datasets.load_boston(return_X_y=True)
    feature_types = (['numerical'] * 3) + ['categorical'] + (['numerical'] * 9)
    X_train, X_test, y_train, y_test = \
        sklearn.model_selection.train_test_split(X, y, random_state=1)

    automl = autosklearn.regression.AutoSklearnRegressor(
        time_left_for_this_task=120,
        per_run_time_limit=30,
        tmp_folder='/tmp/autosklearn_regression_example_tmp',
        output_folder='/tmp/autosklearn_regression_example_out',
        resampling_strategy=LeaveOneOut, 
        resampling_strategy_arguments={'X': X_train}
    )
    automl.fit(X_train.copy(), y_train.copy(), dataset_name='boston',
               feat_type=feature_types)
    automl.refit(X_train.copy(), y_train.copy())

    print(automl.show_models())
    predictions = automl.predict(X_test)
    print("R2 score:", sklearn.metrics.r2_score(y_test, predictions))


if __name__ == '__main__':
    main()

Top Results From Across the Web

sklearn.model_selection.BaseCrossValidator()

BaseCrossValidator ] The cross-validation object that returns train, test indices for splitting. Returns ------- split_metadata: Dict[str,Any] Dictionary of ...

3.1. Cross-validation: evaluating estimator performance

The simplest way to use cross-validation is to call the cross_val_score helper function on the estimator and the dataset. The following example demonstrates...

pyChemometrics objects - readthedocs

Use the Hotelling T2 or DmodX measure and F statistic to screen for outlier candidates. Parameters: x – Data matrix [n samples, m...

greykite.sklearn.cross_validation - LinkedIn Open Source

[docs]class RollingTimeSeriesSplit(BaseCrossValidator): """Flexible splitter for time-series cross validation and ... Suitable for use in GridSearchCV.

sklearn.model_selection._split — EvalML 0.64.0 documentation

return _num_samples(X) class LeavePOut(BaseCrossValidator): ... This cross-validation object is a variation of KFold that returns stratified folds.