[BUG] NoCredentialsError: Unable to locate credentials On MLFLOW with remote tracking server

See original GitHub issue

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): centos
  • MLflow installed from (source or binary): pip install mlflow
  • MLflow version (run mlflow --version): 1.4.0
  • Python version: 3.7
  • Exact command to reproduce: mlflow server --default-artifact-root s3://bucket_name/ --host 0.0.0.0

Describe the problem

Basically I’m running an mlflow server on an AWS EC2 instance. I have mlflow==1.4.0, boto3==1.10.28 and botocore==1.13.28. The ideia is having a remote server on ec2 and persiste experiment artifaxcts on S3. From my local machine I do: mlflow run sklearn_elasticnet_wine (the example in mlflow repo), having MLFLOW_TRACKING_URI env variable set to point to my ec2 instance. Everithing works fine, it runs successfuly, artifacts are stored in S3. The problem is when I access the UI and select a specific run the server send an exception:

2019/11/27 19:26:44 ERROR mlflow.server: Exception on /ajax-api/2.0/preview/mlflow/model-versions/search [GET] Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/flask/app.py", line 2446, in wsgi_app response = self.full_dispatch_request() File "/usr/lib/python2.7/site-packages/flask/app.py", line 1951, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/lib/python2.7/site-packages/flask/app.py", line 1820, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/lib/python2.7/site-packages/flask/app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "/usr/lib/python2.7/site-packages/flask/app.py", line 1935, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/usr/lib/python2.7/site-packages/mlflow/server/handlers.py", line 137, in wrapper return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/mlflow/server/handlers.py", line 581, in _search_model_versions model_versions_detailed = _get_model_registry_store().search_model_versions( AttributeError: 'NoneType' object has no attribute 'search_model_versions' 2019/11/27 19:26:45 ERROR mlflow.server: Exception on /ajax-api/2.0/preview/mlflow/artifacts/list [GET] Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/flask/app.py", line 2446, in wsgi_app response = self.full_dispatch_request() File "/usr/lib/python2.7/site-packages/flask/app.py", line 1951, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/lib/python2.7/site-packages/flask/app.py", line 1820, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/lib/python2.7/site-packages/flask/app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "/usr/lib/python2.7/site-packages/flask/app.py", line 1935, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/usr/lib/python2.7/site-packages/mlflow/server/handlers.py", line 137, in wrapper return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/mlflow/server/handlers.py", line 394, in _list_artifacts artifact_entities = _get_artifact_repo(run).list_artifacts(path) File "/usr/lib/python2.7/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 68, in list_artifacts for result in results: File "/usr/lib/python2.7/site-packages/botocore/paginate.py", line 255, in __iter__ response = self._make_request(current_kwargs) File "/usr/lib/python2.7/site-packages/botocore/paginate.py", line 332, in _make_request return self._method(**current_kwargs) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 357, in _api_call return self._make_api_call(operation_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 648, in _make_api_call operation_model, request_dict, request_context) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 667, in _make_request return self._endpoint.make_request(operation_model, request_dict) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 102, in make_request return self._send_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 132, in _send_request request = self.create_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 116, in create_request operation_name=operation_model.name) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 356, in emit return self._emitter.emit(aliased_event_name, **kwargs) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 228, in emit return self._emit(event_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 211, in _emit response = handler(**kwargs) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler return self.sign(operation_name, request) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 157, in sign auth.add_auth(request) File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 425, in add_auth super(S3SigV4Auth, self).add_auth(request) File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 357, in add_auth raise NoCredentialsError NoCredentialsError: Unable to locate credentials

I have my AWS credentials set in the ec2 instance, both as env variables and in ~/.aws/credentials.

Can you guys please advise on this?

Thank you

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8

github_iconTop GitHub Comments

7reactions
GewoonMaartencommented, Oct 28, 2021

Okay, I’ve figured it out. I was under the impression that the mlflow tracking server connected to S3, but this is not the case. This is indicated in the docs (https://mlflow.org/docs/latest/tracking.html#scenario-4-mlflow-with-remote-tracking-server-backend-and-artifact-stores) but I missed it.

The trainer directly connects to S3, so you need to add your credentials to it. I did it like this:

import os

os.environ["AWS_ACCESS_KEY_ID"] = "minio"
os.environ["AWS_SECRET_ACCESS_KEY"] = "minio123"
os.environ["MLFLOW_S3_ENDPOINT_URL"] = f"http://minio-server:9000"

mlflow.set_tracking_uri("http://mlflow-server:5000")
mlflow.pytorch.autolog()
trainer.fit(model, dm)

Also, boto3 does not allow underscores in the url, so I had to adjust the docker compose I linked earlier.

2reactions
pw24commented, Oct 27, 2021

is the s3 access done on the client directly to s3 or via the mlflow server?

Would also like to confirm this. Experiencing the same issue, and can confirm that from the tracking server pod, that it was possible to load a file out of a s3 bucket via boto3 (aws creds injected via service account):

image

output of file showing ‘does this work’. Meaning that AWS creds loaded into the pod and not giving issues with boto3

However, when exposing the remote tracking server (kubectl port-forward), and pointing my local python script to that remote tracking server (mlflow.set_tracking_uri), I can confirm that the experiment is created on the tracking server:

image while getting botocore exceptions when trying to log the model:

File “c:\xx\AppData\Local\Programs\Python\Python39\lib\site-packages\botocore\signers.py”, line 90, in handler
return self.sign(operation_name, request) File “c:\xx\AppData\Local\Programs\Python\Python39\lib\site-packages\botocore\signers.py”, line 162, in sign
auth.add_auth(request) File “c:\xx\AppData\Local\Programs\Python\Python39\lib\site-packages\botocore\auth.py”, line 373, in add_auth
raise NoCredentialsError() botocore.exceptions.NoCredentialsError: Unable to locate credentials

Read more comments on GitHub >

github_iconTop Results From Across the Web

MLflow proxied artifact access: Unable to locate credentials
The problem is that the server is running on wrong run parameters, the --default-artifact-root needs to either be removed or set to ...
Read more >
Resolving the Boto3 NoCredentialsError in Python - Rollbar
NoCredentialsError : Unable to locate credentials." This confirms there is an issue with the AWS credentials which needs to be corrected.
Read more >
MLflow Tracking — MLflow 0.4.1 documentation
Where Runs Get Recorded. MLflow runs can be recorded either locally in files or remotely to a tracking server. By default, the MLflow...
Read more >
Cannot get rid of Ray Tune invoking botocore and failing - Ray AIR ...
NoCredentialsError : Unable to locate credentials It seems that botocore… ... but I don't get any reports into the mlflow server then.
Read more >
Access the MLflow tracking server from outside Databricks
This article describes the required configuration steps. Start by installing MLflow and configuring your credentials (Step 1). You can then ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found