Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler "stops" after a while #5485

Open
markretallack opened this issue May 21, 2024 · 1 comment
Open

Scheduler "stops" after a while #5485

markretallack opened this issue May 21, 2024 · 1 comment

Comments

@markretallack
Copy link

Environment details:

  • AWS EKS 1.29
  • Tag 7863e27 for scheduler, controller, invoker

Steps to reproduce the issue:

  1. Deploy using scheduler etc...
  2. deploy some functions (cron based etc...

Provide the expected results and outputs:

The system works normally

Provide the actual results and outputs:

After a while ( about an hour). The system becomes unstable.

In the controller log:

[2024-05-21T13:00:09.119Z] [ERROR] [#tid_kDcLqa1uLqzR7GhLLdejqhGeS3mbeHQo] [] Failed to recreate queue for dataspace/ncarpark/[email protected], no scheduler endpoint available

Also seeing this in the controller log:

[2024-05-21T13:00:06.174Z] [WARN] [#tid_kDcLqa1uLqzR7GhLLdejqhGeS3mbeHQo] [] The whisk/queue/dataspace/dataspace/carpark/carpark/leader is deleted from ETCD, but there are still unhandled activations for this action, try to create a new queue

In the scheduler log I am seeing this:

[2024-05-21T13:00:09.876Z] [WARN] [#tid_sid_unknown] [EtcdWorker] a lease is expired while registering an initial data whisk/queue/dataspace/dataspace/carpark/carpark/leader, reissue it: io.grpc.StatusRuntimeException: NOT_FOUND: etcdserver: requested lease not found

And also:

[2024-05-21T13:00:10.195Z] [WARN] [#tid_sid_unknown] [EtcdWorker] a lease is expired while registering an initial data whisk/scheduler/0, reissue it: io.grpc.StatusRuntimeException: NOT_FOUND: etcdserver: requested lease not found

Not sure where to look for this issue

@markretallack
Copy link
Author

My current solution is to disable the new scheduler until I can find the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant