Cache deployer fails if the cluster signer is not set
See original GitHub issueWhat steps did you take:
[A clear and concise description of what the bug is.]
When deploying kubeflow using kfctl_istio_dex.v1.1.0.yaml on a Charmed Kubernetes 1.19 cluster the cache-server and cache-deployer-deployment pods get stuck in PodInitializing and CrashLoopBackOff respectively. The cache-server pod shows the error MountVolume.SetUp failed for volume "webhook-tls-certs" : secret "webhook-server-tls" not found. Redploying either or both of the pods does not fix the issue. The cache-deployer-deployment pod gives the following logs:
+ echo 'Start deploying cache service to existing cluster:'
+ NAMESPACE=kubeflow
+ MUTATING_WEBHOOK_CONFIGURATION_NAME=cache-webhook-kubeflow
+ WEBHOOK_SECRET_NAME=webhook-server-tls
Start deploying cache service to existing cluster:
+ kubectl get mutatingwebhookconfigurations cache-webhook-kubeflow --namespace kubeflow --ignore-not-found
+ kubectl get secrets webhook-server-tls --namespace kubeflow --ignore-not-found
+ webhook_config_exists=false
+ grep cache-webhook-kubeflow -w
+ webhook_secret_exists=false
+ grep webhook-server-tls -w
+ '[' false '==' true ]
+ '[' false '==' true ]
+ '[' false '==' true ]
+ export 'CA_FILE=ca_cert'
+ rm -f ca_cert
+ touch ca_cert
+ ./webhook-create-signed-cert.sh --namespace kubeflow --cert_output_path ca_cert --secret webhook-server-tls
+ [[ 6 -gt 0 ]]
+ case ${1} in
+ namespace=kubeflow
+ shift
+ shift
+ [[ 4 -gt 0 ]]
+ case ${1} in
+ cert_output_path=ca_cert
+ shift
+ shift
+ [[ 2 -gt 0 ]]
+ case ${1} in
+ secret=webhook-server-tls
+ shift
+ shift
+ [[ 0 -gt 0 ]]
+ '[' -z ']'
+ service=cache-server
+ '[' -z webhook-server-tls ']'
+ '[' -z kubeflow ']'
+ '[' -z ca_cert ']'
++ command -v openssl
+ '[' '!' -x /usr/bin/openssl ']'
+ csrName=cache-server.kubeflow
++ mktemp -d
+ tmpdir=/tmp/tmp.KGlEMA
+ echo 'creating certs in tmpdir /tmp/tmp.KGlEMA '
creating certs in tmpdir /tmp/tmp.KGlEMA
+ cat
+ openssl genrsa -out /tmp/tmp.KGlEMA/server-key.pem 2048
Generating RSA private key, 2048 bit long modulus (2 primes)
.......................................................................................+++++
...................................................................+++++
e is 65537 (0x010001)
+ openssl req -new -key /tmp/tmp.KGlEMA/server-key.pem -subj /CN=cache-server.kubeflow.svc -out /tmp/tmp.KGlEMA/server.csr -config /tmp/tmp.KGlEMA/csr.conf
+ echo 'start running kubectl...'
start running kubectl...
+ kubectl delete csr cache-server.kubeflow
certificatesigningrequest.certificates.k8s.io "cache-server.kubeflow" deleted
+ cat
+ kubectl create -f -
++ cat /tmp/tmp.KGlEMA/server.csr
++ base64
++ tr -d '\n'
certificatesigningrequest.certificates.k8s.io/cache-server.kubeflow created
+ true
+ kubectl get csr cache-server.kubeflow
NAME AGE SIGNERNAME REQUESTOR CONDITION
cache-server.kubeflow 0s kubernetes.io/legacy-unknown system:serviceaccount:kubeflow:kubeflow-pipelines-cache-deployer-sa Pending
+ '[' 0 -eq 0 ']'
+ break
+ kubectl certificate approve cache-server.kubeflow
No resources found
error: no kind "CertificateSigningRequest" is registered for version "certificates.k8s.io/v1" in scheme "k8s.io/kubernetes/pkg/kubectl/scheme/scheme.go:28"
The cache-server.kubeflow csr is stuck in a Pending condition. However, manually running kubectl certificate approve cache-server.kubeflow does work.
The following pull requests seem to be related: https://github.com/openshift/oc/pull/501 https://github.com/openshift/installer/pull/3943
Environment:
Charmed Kubernetes 1.19 running on Ubuntu 20.04.1.
How did you deploy Kubeflow Pipelines (KFP)? full Kubeflow deployment
/kind bug /area backend
Issue Analytics
- State:
- Created 3 years ago
- Comments:20 (14 by maintainers)
Top Related StackOverflow Question
I think the issue is caused by the fact that
signerNameis a required field that is not set, andkubernetes.io/legacy-unknownhas been removed from Kubernetes 1.19. It will need to replaced bykubernetes.io/kube-apiserver-client,kubernetes.io/kube-apiserver-client-kubeletorkubernetes.io/kubelet-serving. https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/#kubernetes-signersHi @DavidSpek :
I am also getting the same error:
And I have no way of setting the
--cluster-signing-cert-fileand--cluster-signing-key-filefrom my side as the rancher kubernetes deployment is managed elsewhere.Is there an example of what the cert-manager approach entails?
I’m trying to deploy kubeflow v1.3-branch with kustomize.