Passing Pipeline Variable for entry_point while using XGBoost Estimator in script mode fails

See original GitHub issue

Describe the bug Passing Pipeline Variable (e.g. S3 URL) for entry_point while using XGBoost Estimator in script mode fails to generate a pipeline definition. As the estimator tries to parse the variable and fails at url parsing.

To reproduce

model_output_path = ParameterString(name="ModelOutputPath")

xgb_script_mode_estimator = XGBoost(
        entry_point=script_path,
        framework_version="1.5-1",  # Note: framework_version is mandatory
        role=role,
        instance_count=training_instance_count,
        instance_type=training_instance_type,
        output_path=model_output_path,
        hyperparameters={
            "eval_metric": eval_metric,
            "min_child_weight": min_child_weight
        },
        checkpoint_s3_uri=checkpoint_path
    )

    train_input = TrainingInput(
        s3_processed_training_samples_url, content_type=content_type
    )
    validation_input = TrainingInput(
        s3_processed_testing_samples_url, content_type=content_type
    )

    step_train = TrainingStep(
        name="TrainModel",
        estimator=xgb_script_mode_estimator,
        inputs={"train": train_input, "validation": validation_input},
    )
pipeline = get_pipeline(
    region=region,
    role=role,
    default_bucket=default_bucket,
    model_package_group_name=model_package_group_name,
    pipeline_name=pipeline_name
)

import json
json.loads(pipeline.definition())

generates the error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-7d78ecac635c> in <module>
      1 import json
----> 2 json.loads(pipeline.definition())

/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/pipeline.py in definition(self)
    299     def definition(self) -> str:
    300         """Converts a request structure to string representation for workflow service calls."""
--> 301         request_dict = self.to_request()
    302         request_dict["PipelineExperimentConfig"] = interpolate(
    303             request_dict["PipelineExperimentConfig"], {}, {}

/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/pipeline.py in to_request(self)
     89             if self.pipeline_experiment_config is not None
     90             else None,
---> 91             "Steps": list_to_request(self.steps),
     92         }
     93 

/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/utilities.py in list_to_request(entities)
     40     for entity in entities:
     41         if isinstance(entity, Entity):
---> 42             request_dicts.append(entity.to_request())
     43         elif isinstance(entity, StepCollection):
     44             request_dicts.extend(entity.request_dicts())

/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/steps.py in to_request(self)
    324     def to_request(self) -> RequestType:
    325         """Updates the request dictionary with cache configuration."""
--> 326         request_dict = super().to_request()
    327         if self.cache_config:
    328             request_dict.update(self.cache_config.config)

/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/steps.py in to_request(self)
    212     def to_request(self) -> RequestType:
    213         """Gets the request structure for `ConfigurableRetryStep`."""
--> 214         step_dict = super().to_request()
    215         if self.retry_policies:
    216             step_dict["RetryPolicies"] = self._resolve_retry_policy(self.retry_policies)

/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/steps.py in to_request(self)
    101             "Name": self.name,
    102             "Type": self.step_type.value,
--> 103             "Arguments": self.arguments,
    104         }
    105         if self.depends_on:

/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/steps.py in arguments(self)
    306         """
    307 
--> 308         self.estimator._prepare_for_training(self.job_name)
    309         train_args = _TrainingJob._get_train_args(
    310             self.estimator, self.inputs, experiment_config=dict()

/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in _prepare_for_training(self, job_name)
   2660                 constructor if applicable.
   2661         """
-> 2662         super(Framework, self)._prepare_for_training(job_name=job_name)
   2663 
   2664         self._validate_and_set_debugger_configs()

/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in _prepare_for_training(self, job_name)
    659                 self.code_uri = self.uploaded_code.s3_prefix
    660             else:
--> 661                 self.uploaded_code = self._stage_user_code_in_s3()
    662                 code_dir = self.uploaded_code.s3_prefix
    663                 script = self.uploaded_code.script_name

/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in _stage_user_code_in_s3(self)
    699             kms_key = None
    700         elif self.code_location is None:
--> 701             code_bucket, _ = parse_s3_url(self.output_path)
    702             code_s3_prefix = "{}/{}".format(self._current_job_name, "source")
    703             kms_key = self.output_kms_key

/opt/conda/lib/python3.7/site-packages/sagemaker/s3.py in parse_s3_url(url)
     37     parsed_url = urlparse(url)
     38     if parsed_url.scheme != "s3":
---> 39         raise ValueError("Expecting 's3' scheme, got: {} in {}.".format(parsed_url.scheme, url))
     40     return parsed_url.netloc, parsed_url.path.lstrip("/")
     41 

/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/entities.py in __str__(self)
     79     def __str__(self):
     80         """Override built-in String function for PipelineVariable"""
---> 81         raise TypeError("Pipeline variables do not support __str__ operation.")
     82 
     83     def __int__(self):

TypeError: Pipeline variables do not support __str__ operation.

Expected behavior The pipeline definition has to be generated.

Screenshots or logs N/A

System information A description of your system. Please provide:

  • SageMaker Python SDK version: 2.87.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): XGBoost
  • Framework version: 1.5-1
  • Python version: 3.7.10
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context Add any other context about the problem here.

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:2
  • Comments:12 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
mouhannadalicommented, May 30, 2022

Hi @jessieweiyi , I have the same problem. I am doing the following:

input_path = "s3://bucket/key/"
inputs = TrainingInput(s3_data=input_path)

step_train = TrainingStep(
    name="MyTrainingStep",
    estimator = estimator,
    inputs=inputs,
)

pipeline_name = f"TF2Workflow"

pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
                training_instance_type, 
                training_instance_count,
               ],
    steps=[step_train],
    sagemaker_session=sess
)

running pipeline.definition() will result in an error: Pipeline variables do not support __str__ operation. Please use .to_string()to convert it to string type in execution timeor use.expr to translate it to Json for display purpose in Python SDK.

any idea?

0reactions
jerrypeng7773commented, May 26, 2022

@rohangpatil, PR 3111 should fix this issue, can you try it out and let us know?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pipeline Steps - Amazon SageMaker - AWS Documentation
SageMaker Pipelines are composed of steps. These steps define the actions that the pipeline takes and the relationships between steps using properties. Topics....
Read more >
entry_point file using XGBoost as a framework in sagemaker
I.e. your_xgboost_abalone_script.py can be located in the same directory where you are running the SageMaker SDK ("source code"). Then you can ...
Read more >
Estimators — sagemaker 2.124.0 documentation
After this amount of time Amazon SageMaker terminates the job regardless of its current status. input_mode (str or PipelineVariable) – The input mode...
Read more >
Using Scikit-learn with the SageMaker Python SDK
With Scikit-learn Estimators, you can train and host Scikit-learn models on Amazon ... When running your training script on SageMaker, it has access...
Read more >
Using the SageMaker Python SDK
When you create an estimator, you can specify a training script that is stored in a GitHub (or other Git) or CodeCommit repository...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found