Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval-running often hangs on last sample #1384

Open
sjadler2004 opened this issue Oct 17, 2023 · 4 comments
Open

Eval-running often hangs on last sample #1384

sjadler2004 opened this issue Oct 17, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@sjadler2004
Copy link
Collaborator

Describe the bug

Relatively often, my eval-run will be at say samples 199/200 but then will hang for a very long period of time on the last one. It isn't clear to me why this occurs, but sometimes it'll persist as long as an hour or more, at which point I generally terminate the command from my CLI and try again

To Reproduce

I'm not sure how to make this happen every time unfortunately. It does seem more likely to happen on bigger sampling runs than small ones though.

Code snippets

No response

OS

macOS

Python version

Python v3.11

Library version

latest

@sjadler2004 sjadler2004 added the bug Something isn't working label Oct 17, 2023
@sjadler2004
Copy link
Collaborator Author

Strangely, even after KeyboardInterrupt, it often takes a while for my Terminal to regain the ability to run normal commands after this error occurs - not sure if that helps to pin down the problem

@LRudL
Copy link

LRudL commented Nov 28, 2023

I also have this issue. It is not about rate limits, because it happens despite running datasets that are definitely below the tokens per minute and requests per minute rate limits. However, it does only seem to show up for large datasets.

An example of the error trace when I ctrl+C twice to exit after it gets stuck for a long time:

Traceback (most recent call last):
  File "/home/lrudl/miniconda3/envs/evalg2/lib/python3.10/multiprocessing/pool.py", line 856, in next
    item = self._items.popleft()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
[...]
  File "/home/lrudl/[...]/evals/evals/cli/oaieval.py", line 223, in run
    result = eval.run(recorder)
  File "/home/lrudl/[...]/evals/evals/elsuite/modelgraded/classify.py", line 107, in run
    self.eval_all_samples(recorder, samples)
  File "/home/lrudl/[...]/evals/evals/eval.py", line 146, in eval_all_samples
    idx_and_result = list(tqdm(iter, total=len(work_items), disable=not show_progress))
  File "/home/lrudl/miniconda3/envs/evalg2/lib/python3.10/site-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/home/lrudl/miniconda3/envs/evalg2/lib/python3.10/multiprocessing/pool.py", line 861, in next
    self._cond.wait(timeout)
  File "/home/lrudl/miniconda3/envs/evalg2/lib/python3.10/threading.py", line 320, in wait
    waiter.acquire()
KeyboardInterrupt


^CException ignored in: <module 'threading' from '/home/lrudl/miniconda3/envs/evalg2/lib/python3.10/threading.py'>
Traceback (most recent call last):
  File "/home/lrudl/miniconda3/envs/evalg2/lib/python3.10/threading.py", line 1537, in _shutdown
    atexit_call()
  File "/home/lrudl/miniconda3/envs/evalg2/lib/python3.10/concurrent/futures/thread.py", line 31, in _python_exit
    t.join()
  File "/home/lrudl/miniconda3/envs/evalg2/lib/python3.10/threading.py", line 1096, in join
    self._wait_for_tstate_lock()
  File "/home/lrudl/miniconda3/envs/evalg2/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
KeyboardInterrupt: 

Often all I need to do is try again a few times for it to eventually run all the way to completion, but:
(1) This massively increases the token cost.
(2) This makes it difficult to efficiently run many evals in sequence with a script, because you need to manually supervise it and get it unstuck many times. This is a major time cost for big eval projects.

@katsuya
Copy link

katsuya commented Nov 29, 2023

It seems that this issue is influenced by a bug in tqdm, as discussed at tqdm/tqdm#627. Applying the following patch significantly improved the situation.

diff -urN a/.venv/lib/python3.11/site-packages/evals/eval.py b/.venv/lib/python3.11/site-packages/evals/eval.py
--- a/.venv/lib/python3.11/site-packages/evals/eval.py  2023-11-29 12:55:58.214648049 +0900
+++ b/.venv/lib/python3.11/site-packages/evals/eval.py  2023-11-29 12:56:05.630671841 +0900
@@ -143,7 +143,8 @@
             else:
                 logger.info(f"Running in threaded mode with {threads} threads!")
                 iter = pool.imap_unordered(eval_sample, work_items)
-            idx_and_result = list(tqdm(iter, total=len(work_items), disable=not show_progress))
+            # idx_and_result = list(tqdm(iter, total=len(work_items), disable=not show_progress))
+            idx_and_result = list(iter)
         return [r for _, r in sorted(idx_and_result)]

     def get_samples(self):

@isc-Shiva-Gupta
Copy link

I also had this issue. A workaround I found is to use the EVALS_THREADS_TIMEOUT flag while running the command. It specifies the time allowed for every input to the model to run. It can be used as follows:

EVALS_THREADS_TIMEOUT=20 oaieval completion_fn eval_name

etr2460 pushed a commit that referenced this issue Mar 25, 2024
As has been brought up before (#1384, #1292,
#270), evals suffer from a hanging
issue, where an evaluation run will hang for a very long time (if not
indefinitely) at the end of a run (say, on the 99th sample of out 100).

This PR addresses this issue, by replacing a seemingly redundant
single-threaded thread creation that was happening when making requests,
nested inside the already multi-threaded eval loop. My impression is
that this nested multithreading was causing overhead that resulted in
the hanging experienced.

I had also noticed this hanging issue in `EVALS_SEQUENTIAL=1` mode
(where it no longer occurs at the end, but instead randomly in the
middle of the run).

I was able to identify the source of this issue though debugging print
statements that ultimately pointed to the `request_with_timeout`
function as the culprit.

We have tested the new `request_with_timeout` code on a fork where we
have run multiple new and pre-existing evals, including with 3rd party
solvers, and found no change in behaviour or errors, and a clear
improvement on the hanging issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants