Cache deployer fails if the cluster signer is not set

See original GitHub issue

What steps did you take:

[A clear and concise description of what the bug is.] When deploying kubeflow using kfctl_istio_dex.v1.1.0.yaml on a Charmed Kubernetes 1.19 cluster the cache-server and cache-deployer-deployment pods get stuck in PodInitializing and CrashLoopBackOff respectively. The cache-server pod shows the error MountVolume.SetUp failed for volume "webhook-tls-certs" : secret "webhook-server-tls" not found. Redploying either or both of the pods does not fix the issue. The cache-deployer-deployment pod gives the following logs:

+ echo 'Start deploying cache service to existing cluster:'
+ NAMESPACE=kubeflow
+ MUTATING_WEBHOOK_CONFIGURATION_NAME=cache-webhook-kubeflow
+ WEBHOOK_SECRET_NAME=webhook-server-tls
Start deploying cache service to existing cluster:
+ kubectl get mutatingwebhookconfigurations cache-webhook-kubeflow --namespace kubeflow --ignore-not-found
+ kubectl get secrets webhook-server-tls --namespace kubeflow --ignore-not-found
+ webhook_config_exists=false
+ grep cache-webhook-kubeflow -w
+ webhook_secret_exists=false
+ grep webhook-server-tls -w
+ '[' false '==' true ]
+ '[' false '==' true ]
+ '[' false '==' true ]
+ export 'CA_FILE=ca_cert'
+ rm -f ca_cert
+ touch ca_cert
+ ./webhook-create-signed-cert.sh --namespace kubeflow --cert_output_path ca_cert --secret webhook-server-tls
+ [[ 6 -gt 0 ]]
+ case ${1} in
+ namespace=kubeflow
+ shift
+ shift
+ [[ 4 -gt 0 ]]
+ case ${1} in
+ cert_output_path=ca_cert
+ shift
+ shift
+ [[ 2 -gt 0 ]]
+ case ${1} in
+ secret=webhook-server-tls
+ shift
+ shift
+ [[ 0 -gt 0 ]]
+ '[' -z ']'
+ service=cache-server
+ '[' -z webhook-server-tls ']'
+ '[' -z kubeflow ']'
+ '[' -z ca_cert ']'
++ command -v openssl
+ '[' '!' -x /usr/bin/openssl ']'
+ csrName=cache-server.kubeflow
++ mktemp -d
+ tmpdir=/tmp/tmp.KGlEMA
+ echo 'creating certs in tmpdir /tmp/tmp.KGlEMA '
creating certs in tmpdir /tmp/tmp.KGlEMA 
+ cat
+ openssl genrsa -out /tmp/tmp.KGlEMA/server-key.pem 2048
Generating RSA private key, 2048 bit long modulus (2 primes)
.......................................................................................+++++
...................................................................+++++
e is 65537 (0x010001)
+ openssl req -new -key /tmp/tmp.KGlEMA/server-key.pem -subj /CN=cache-server.kubeflow.svc -out /tmp/tmp.KGlEMA/server.csr -config /tmp/tmp.KGlEMA/csr.conf
+ echo 'start running kubectl...'
start running kubectl...
+ kubectl delete csr cache-server.kubeflow
certificatesigningrequest.certificates.k8s.io "cache-server.kubeflow" deleted
+ cat
+ kubectl create -f -
++ cat /tmp/tmp.KGlEMA/server.csr
++ base64
++ tr -d '\n'
certificatesigningrequest.certificates.k8s.io/cache-server.kubeflow created
+ true
+ kubectl get csr cache-server.kubeflow
NAME                    AGE   SIGNERNAME                     REQUESTOR                                                             CONDITION
cache-server.kubeflow   0s    kubernetes.io/legacy-unknown   system:serviceaccount:kubeflow:kubeflow-pipelines-cache-deployer-sa   Pending
+ '[' 0 -eq 0 ']'
+ break
+ kubectl certificate approve cache-server.kubeflow
No resources found
error: no kind "CertificateSigningRequest" is registered for version "certificates.k8s.io/v1" in scheme "k8s.io/kubernetes/pkg/kubectl/scheme/scheme.go:28"

The cache-server.kubeflow csr is stuck in a Pending condition. However, manually running kubectl certificate approve cache-server.kubeflow does work.

The following pull requests seem to be related: https://github.com/openshift/oc/pull/501 https://github.com/openshift/installer/pull/3943

Environment:

Charmed Kubernetes 1.19 running on Ubuntu 20.04.1.

How did you deploy Kubeflow Pipelines (KFP)? full Kubeflow deployment

/kind bug /area backend

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:20 (14 by maintainers)

github_iconTop GitHub Comments

3reactions
DavidSpekcommented, Nov 20, 2020

I think the issue is caused by the fact that signerName is a required field that is not set, and kubernetes.io/legacy-unknown has been removed from Kubernetes 1.19. It will need to replaced by kubernetes.io/kube-apiserver-client, kubernetes.io/kube-apiserver-client-kubelet or kubernetes.io/kubelet-serving. https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/#kubernetes-signers

2reactions
grmoktancommented, Jul 23, 2021

Hi @DavidSpek :

I am also getting the same error:

+ echo 'ERROR: After approving csr cache-server.kubeflow, the signed certificate did not appear on the resource. Giving up after 10 attempts.'
ERROR: After approving csr cache-server.kubeflow, the signed certificate did not appear on the resource. Giving up after 10 attempts.
+ exit 1

And I have no way of setting the --cluster-signing-cert-file and --cluster-signing-key-file from my side as the rancher kubernetes deployment is managed elsewhere.

Is there an example of what the cert-manager approach entails?

I’m trying to deploy kubeflow v1.3-branch with kustomize.

Read more comments on GitHub >

github_iconTop Results From Across the Web

GitLab Runner Helm Chart
The official way of deploying a GitLab Runner instance into your Kubernetes cluster is by using the gitlab-runner Helm chart. This chart configures...
Read more >
Troubleshoot Cloud Run issues
The following error occurs when you try to deploy: Container failed to start. Failed to start and then listen on the port defined...
Read more >
kube-apiserver
If set, any request presenting a client certificate signed by one of the authorities in the client-ca-file is authenticated with an identity ...
Read more >
Amazon EKS troubleshooting
If you receive the error "aws-iam-authenticator": executable file not found in $PATH , then your kubectl is not configured for Amazon EKS. For...
Read more >
3 Known Issues and Workarounds
Error When Configuring Security Role For Newly Created Coherence Cluster Service or Cache. Issue. Impacted Platforms: All. An unexpected error condition is ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found