Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] suppress connection error warnings when disconnected from k8s #3674

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

asaiacai
Copy link
Contributor

@asaiacai asaiacai commented Jun 18, 2024

suppresses urllib3 warnings for sky show-gpus if k8s cluster gets disconnected. Closes #3591

before changes

(base) gcpuser@k3s-ebd1-head-82q3gnw3-compute:~/skypilot$ sky show-gpus --cloud kubernetes
WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f75a56eeb60>: Failed to establish a new connection: [Errno 111] Connection refused')': /api/v1/namespaces/default/pods
WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f75a56ef3a0>: Failed to establish a new connection: [Errno 111] Connection refused')': /api/v1/namespaces/default/pods
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f75a56ef6a0>: Failed to establish a new connection: [Errno 111] Connection refused')': /api/v1/namespaces/default/pods
No GPUs found in Kubernetes cluster. If your cluster contains GPUs, make sure nvidia.com/gpu resource is available on the nodes and the node labels for identifying GPUs (e.g., skypilot.co/accelerator) are setup correctly. To further debug, run: sky check

after changes

(base) gcpuser@k3s-ebd1-head-82q3gnw3-compute:~/skypilot$ sky show-gpus --cloud kubernetes
No GPUs found in Kubernetes cluster. If your cluster contains GPUs, make sure nvidia.com/gpu resource is available on the nodes and the node labels for identifying GPUs (e.g., skypilot.co/accelerator) are setup correctly. To further debug, run: sky check 

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)

To create a disconnected cluster I (un)installed via k3s on a GCE VM

sky launch -c k3s --cloud gcp
ssh k3s
curl -sfL https://get.k3s.io | sh -
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown -R $USER ~/.kube
sky check
sky show-gpus --cloud kubernetes
bash /usr/local/bin/k3s-uninstall.sh
sky show-gpus --cloud kubernetes

format

set urllib3 log level

set to ERROR level
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @asaiacai! This looks good to me. I will let @romilbhardwaj to do a final check. : )

sky/provision/kubernetes/utils.py Outdated Show resolved Hide resolved
@asaiacai asaiacai changed the title suppress connection error warnings when disconnected from k8s [k8s] suppress connection error warnings when disconnected from k8s Jun 19, 2024
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @asaiacai!

sky/provision/logging.py Outdated Show resolved Hide resolved
sky/provision/kubernetes/utils.py Outdated Show resolved Hide resolved
@romilbhardwaj romilbhardwaj added this to the v0.6.1 milestone Jun 25, 2024
@asaiacai
Copy link
Contributor Author

@romilbhardwaj edited so the top level kubernetes APIs are all wrapped in a logging context. I've attached some example commands and outputs after disconnecting here including show-gpus launch exec down jobs launch and serve up. I guess one thing to note is the trace dumps for some of these are quite verbose, not sure if the intention is to catch these and replace them with something shorter but let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[k8s] Suppress warnings when kubernetes cluster is not reachable
3 participants