[BUG] NoCredentialsError: Unable to locate credentials On MLFLOW with remote tracking server
See original GitHub issueSystem information
- Have I written custom code (as opposed to using a stock example script provided in MLflow):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): centos
- MLflow installed from (source or binary): pip install mlflow
- MLflow version (run
mlflow --version): 1.4.0 - Python version: 3.7
- Exact command to reproduce: mlflow server --default-artifact-root s3://bucket_name/ --host 0.0.0.0
Describe the problem
Basically I’m running an mlflow server on an AWS EC2 instance. I have mlflow==1.4.0, boto3==1.10.28 and botocore==1.13.28. The ideia is having a remote server on ec2 and persiste experiment artifaxcts on S3. From my local machine I do: mlflow run sklearn_elasticnet_wine (the example in mlflow repo), having MLFLOW_TRACKING_URI env variable set to point to my ec2 instance. Everithing works fine, it runs successfuly, artifacts are stored in S3. The problem is when I access the UI and select a specific run the server send an exception:
2019/11/27 19:26:44 ERROR mlflow.server: Exception on /ajax-api/2.0/preview/mlflow/model-versions/search [GET] Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/flask/app.py", line 2446, in wsgi_app response = self.full_dispatch_request() File "/usr/lib/python2.7/site-packages/flask/app.py", line 1951, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/lib/python2.7/site-packages/flask/app.py", line 1820, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/lib/python2.7/site-packages/flask/app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "/usr/lib/python2.7/site-packages/flask/app.py", line 1935, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/usr/lib/python2.7/site-packages/mlflow/server/handlers.py", line 137, in wrapper return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/mlflow/server/handlers.py", line 581, in _search_model_versions model_versions_detailed = _get_model_registry_store().search_model_versions( AttributeError: 'NoneType' object has no attribute 'search_model_versions' 2019/11/27 19:26:45 ERROR mlflow.server: Exception on /ajax-api/2.0/preview/mlflow/artifacts/list [GET] Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/flask/app.py", line 2446, in wsgi_app response = self.full_dispatch_request() File "/usr/lib/python2.7/site-packages/flask/app.py", line 1951, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/lib/python2.7/site-packages/flask/app.py", line 1820, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/lib/python2.7/site-packages/flask/app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "/usr/lib/python2.7/site-packages/flask/app.py", line 1935, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/usr/lib/python2.7/site-packages/mlflow/server/handlers.py", line 137, in wrapper return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/mlflow/server/handlers.py", line 394, in _list_artifacts artifact_entities = _get_artifact_repo(run).list_artifacts(path) File "/usr/lib/python2.7/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 68, in list_artifacts for result in results: File "/usr/lib/python2.7/site-packages/botocore/paginate.py", line 255, in __iter__ response = self._make_request(current_kwargs) File "/usr/lib/python2.7/site-packages/botocore/paginate.py", line 332, in _make_request return self._method(**current_kwargs) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 357, in _api_call return self._make_api_call(operation_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 648, in _make_api_call operation_model, request_dict, request_context) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 667, in _make_request return self._endpoint.make_request(operation_model, request_dict) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 102, in make_request return self._send_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 132, in _send_request request = self.create_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 116, in create_request operation_name=operation_model.name) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 356, in emit return self._emitter.emit(aliased_event_name, **kwargs) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 228, in emit return self._emit(event_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 211, in _emit response = handler(**kwargs) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler return self.sign(operation_name, request) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 157, in sign auth.add_auth(request) File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 425, in add_auth super(S3SigV4Auth, self).add_auth(request) File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 357, in add_auth raise NoCredentialsError NoCredentialsError: Unable to locate credentials
I have my AWS credentials set in the ec2 instance, both as env variables and in ~/.aws/credentials.
Can you guys please advise on this?
Thank you
Issue Analytics
- State:
- Created 4 years ago
- Comments:8
Top Related StackOverflow Question
Okay, I’ve figured it out. I was under the impression that the mlflow tracking server connected to S3, but this is not the case. This is indicated in the docs (https://mlflow.org/docs/latest/tracking.html#scenario-4-mlflow-with-remote-tracking-server-backend-and-artifact-stores) but I missed it.
The trainer directly connects to S3, so you need to add your credentials to it. I did it like this:
Also,
boto3does not allow underscores in the url, so I had to adjust the docker compose I linked earlier.Would also like to confirm this. Experiencing the same issue, and can confirm that from the tracking server pod, that it was possible to load a file out of a s3 bucket via boto3 (aws creds injected via service account):
However, when exposing the remote tracking server (kubectl port-forward), and pointing my local python script to that remote tracking server (mlflow.set_tracking_uri), I can confirm that the experiment is created on the tracking server: