Passing Pipeline Variable for entry_point while using XGBoost Estimator in script mode fails
See original GitHub issueDescribe the bug Passing Pipeline Variable (e.g. S3 URL) for entry_point while using XGBoost Estimator in script mode fails to generate a pipeline definition. As the estimator tries to parse the variable and fails at url parsing.
To reproduce
model_output_path = ParameterString(name="ModelOutputPath")
xgb_script_mode_estimator = XGBoost(
entry_point=script_path,
framework_version="1.5-1", # Note: framework_version is mandatory
role=role,
instance_count=training_instance_count,
instance_type=training_instance_type,
output_path=model_output_path,
hyperparameters={
"eval_metric": eval_metric,
"min_child_weight": min_child_weight
},
checkpoint_s3_uri=checkpoint_path
)
train_input = TrainingInput(
s3_processed_training_samples_url, content_type=content_type
)
validation_input = TrainingInput(
s3_processed_testing_samples_url, content_type=content_type
)
step_train = TrainingStep(
name="TrainModel",
estimator=xgb_script_mode_estimator,
inputs={"train": train_input, "validation": validation_input},
)
pipeline = get_pipeline(
region=region,
role=role,
default_bucket=default_bucket,
model_package_group_name=model_package_group_name,
pipeline_name=pipeline_name
)
import json
json.loads(pipeline.definition())
generates the error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-7d78ecac635c> in <module>
1 import json
----> 2 json.loads(pipeline.definition())
/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/pipeline.py in definition(self)
299 def definition(self) -> str:
300 """Converts a request structure to string representation for workflow service calls."""
--> 301 request_dict = self.to_request()
302 request_dict["PipelineExperimentConfig"] = interpolate(
303 request_dict["PipelineExperimentConfig"], {}, {}
/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/pipeline.py in to_request(self)
89 if self.pipeline_experiment_config is not None
90 else None,
---> 91 "Steps": list_to_request(self.steps),
92 }
93
/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/utilities.py in list_to_request(entities)
40 for entity in entities:
41 if isinstance(entity, Entity):
---> 42 request_dicts.append(entity.to_request())
43 elif isinstance(entity, StepCollection):
44 request_dicts.extend(entity.request_dicts())
/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/steps.py in to_request(self)
324 def to_request(self) -> RequestType:
325 """Updates the request dictionary with cache configuration."""
--> 326 request_dict = super().to_request()
327 if self.cache_config:
328 request_dict.update(self.cache_config.config)
/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/steps.py in to_request(self)
212 def to_request(self) -> RequestType:
213 """Gets the request structure for `ConfigurableRetryStep`."""
--> 214 step_dict = super().to_request()
215 if self.retry_policies:
216 step_dict["RetryPolicies"] = self._resolve_retry_policy(self.retry_policies)
/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/steps.py in to_request(self)
101 "Name": self.name,
102 "Type": self.step_type.value,
--> 103 "Arguments": self.arguments,
104 }
105 if self.depends_on:
/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/steps.py in arguments(self)
306 """
307
--> 308 self.estimator._prepare_for_training(self.job_name)
309 train_args = _TrainingJob._get_train_args(
310 self.estimator, self.inputs, experiment_config=dict()
/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in _prepare_for_training(self, job_name)
2660 constructor if applicable.
2661 """
-> 2662 super(Framework, self)._prepare_for_training(job_name=job_name)
2663
2664 self._validate_and_set_debugger_configs()
/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in _prepare_for_training(self, job_name)
659 self.code_uri = self.uploaded_code.s3_prefix
660 else:
--> 661 self.uploaded_code = self._stage_user_code_in_s3()
662 code_dir = self.uploaded_code.s3_prefix
663 script = self.uploaded_code.script_name
/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in _stage_user_code_in_s3(self)
699 kms_key = None
700 elif self.code_location is None:
--> 701 code_bucket, _ = parse_s3_url(self.output_path)
702 code_s3_prefix = "{}/{}".format(self._current_job_name, "source")
703 kms_key = self.output_kms_key
/opt/conda/lib/python3.7/site-packages/sagemaker/s3.py in parse_s3_url(url)
37 parsed_url = urlparse(url)
38 if parsed_url.scheme != "s3":
---> 39 raise ValueError("Expecting 's3' scheme, got: {} in {}.".format(parsed_url.scheme, url))
40 return parsed_url.netloc, parsed_url.path.lstrip("/")
41
/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/entities.py in __str__(self)
79 def __str__(self):
80 """Override built-in String function for PipelineVariable"""
---> 81 raise TypeError("Pipeline variables do not support __str__ operation.")
82
83 def __int__(self):
TypeError: Pipeline variables do not support __str__ operation.
Expected behavior The pipeline definition has to be generated.
Screenshots or logs N/A
System information A description of your system. Please provide:
- SageMaker Python SDK version: 2.87.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): XGBoost
- Framework version: 1.5-1
- Python version: 3.7.10
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Additional context Add any other context about the problem here.
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:12 (4 by maintainers)
Top Results From Across the Web
Pipeline Steps - Amazon SageMaker - AWS Documentation
SageMaker Pipelines are composed of steps. These steps define the actions that the pipeline takes and the relationships between steps using properties. Topics....
Read more >entry_point file using XGBoost as a framework in sagemaker
I.e. your_xgboost_abalone_script.py can be located in the same directory where you are running the SageMaker SDK ("source code"). Then you can ...
Read more >Estimators — sagemaker 2.124.0 documentation
After this amount of time Amazon SageMaker terminates the job regardless of its current status. input_mode (str or PipelineVariable) – The input mode...
Read more >Using Scikit-learn with the SageMaker Python SDK
With Scikit-learn Estimators, you can train and host Scikit-learn models on Amazon ... When running your training script on SageMaker, it has access...
Read more >Using the SageMaker Python SDK
When you create an estimator, you can specify a training script that is stored in a GitHub (or other Git) or CodeCommit repository...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @jessieweiyi , I have the same problem. I am doing the following:
running pipeline.definition() will result in an error:
Pipeline variables do not support __str__ operation. Please use.to_string()to convert it to string type in execution timeor use.exprto translate it to Json for display purpose in Python SDK.any idea?
@rohangpatil, PR 3111 should fix this issue, can you try it out and let us know?