Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: litellm --test returns Internal Server Error - 'async_generator' object is not iterable #4173

Open
shuther opened this issue Jun 13, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@shuther
Copy link

shuther commented Jun 13, 2024

What happened?

litellm in a docker compose as a proxy. working fine but litellm --test returns an internal error
Likely ollama can't handle the number of requests but the error message is strange?
connection to gpt-3.5 -> ollama/wizardlm2 is working fine as per my test


relevant piece in config .yaml

litellm_settings: 
  drop_params: True
  num_retries: 3 
  request_timeout: 50 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout
  allowed_fails: 3 # cooldown model if it fails > 1 call in a minute.

# Switch off Debug Logging, ensure your logs do not have any debugging on
  set_verbose: True
  cache: False          # set cache responses to True, litellm defaults to using a redis cache

  cache_params:        # set cache params for redis
    type: redis
    namespace: "litellm_caching"

  json_logs: False

# Enable Traceloop
  success_callback: ["traceloop"]

  success_callback: ["prometheus"]
  failure_callback: ["prometheus"]
  service_callback: ["prometheus_system"]

Relevant log output

log returned by litellm --test, using: docker compose exec litellm litellm --test
...
LiteLLM: streaming response from proxy ChatCompletionChunk(id='chatcmpl-ab2b950a-2531-48d7-8ef2-64d10bd1d7aa', choices=[Choice(delta=ChoiceDelta(content='.', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1718298799, model='wizardlm2', object='chat.completion.chunk', system_fingerprint=None, usage=None)
LiteLLM: streaming response from proxy ChatCompletionChunk(id='chatcmpl-ab2b950a-2531-48d7-8ef2-64d10bd1d7aa', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1718298799, model='wizardlm2', object='chat.completion.chunk', system_fingerprint=None, usage=None)

 making completion request to proxy
Traceback (most recent call last):
  File "/usr/local/bin/litellm", line 8, in <module>
    sys.exit(run_server())
             ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/litellm/proxy/proxy_cli.py", line 375, in run_server
    response = client.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 277, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/completions.py", line 528, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1240, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 921, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1005, in _request
    return self._retry_request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1053, in _retry_request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1005, in _request
    return self._retry_request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1053, in _retry_request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1020, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Internal Server Error


other logs through docker compose logs:
litellm-i  | ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
litellm-i  | Final returned optional params: {'num_predict': 256}
litellm-i  | self.optional_params: {'num_predict': 256}
litellm-i  | RAW RESPONSE:
litellm-i  | <bound method ClientResponse.text of <ClientResponse(http://linuxmain.local:11434/api/generate) [200 OK]>
litellm-i  | <CIMultiDictProxy('Content-Type': 'application/json; charset=utf-8', 'Date': 'Thu, 13 Jun 2024 17:17:10 GMT', 'Transfer-Encoding': 'chunked')>
litellm-i  | >
litellm-i  |
litellm-i  |
litellm-i  |
litellm-i  |
litellm-i  | Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.utils.Logging object at 0x7fb114484490>>
litellm-i  | Logging Details LiteLLM-Success Call: None
litellm-i  | Looking up model=ollama/wizardlm2 in model_cost_map
litellm-i  | Success: model=ollama/wizardlm2 in model_cost_map
litellm-i  | prompt_tokens=10; completion_tokens=225
litellm-i  | Returned custom cost for model=ollama/wizardlm2 - prompt_tokens_cost_usd_dollar: 0.00421, completion_tokens_cost_usd_dollar: 0.11699999999999999
litellm-i  | final cost: 0.12121; prompt_tokens_cost_usd_dollar: 0.00421; completion_tokens_cost_usd_dollar: 0.11699999999999999
litellm-i  | 17:17:10 - LiteLLM Router:INFO: router.py:684 - litellm.acompletion(model=ollama/wizardlm2) 200 OK
litellm-i  | 17:17:10 - LiteLLM Router:DEBUG: router.py:2008 - Async Response: ModelResponse(id='chatcmpl-d4372c4c-3389-46b3-87dd-6a5395e269dd', choices=[Choices(finish_reason='stop', index=0, message=Message(content="In the quiet hush of dawn's embrace,\n\nWhere dewdrops cling to leaves with grace,\nThe world awakens in a gentle sway,\nBeneath the canvas of a newborn day.\n\nA symphony of birds takes flight,\nComposing melodies of pure delight,\nAs the sun ascends with a golden light,\nChasing shadows into the night.\n\nPetals unfurl to greet the rays,\nWhispering secrets in soft array,\nNature's artistry on full display,\nIn this moment, time stops to play.\n\nMankind stirs, with dreams anew,\nIn the tapestry of life, we all partake of views,\nWhere every heart beats true,\nAnd in our souls, hope forever broke through.\n\nSo let this test request serve its purpose well,\nA reminder that even words can cast a spell,\nFor poetry is the soul's sweet knell,\nResonating with beauty in each telltale.", role='assistant'))], created=1718299030, model='ollama/wizardlm2', object='chat.completion', system_fingerprint=None, usage=Usage(prompt_tokens=10, completion_tokens=225, total_tokens=235))
litellm-i  | 17:17:10 - LiteLLM:DEBUG: utils.py:1141 - Logging Details LiteLLM-Async Success Call
litellm-i  | 17:17:10 - LiteLLM:DEBUG: utils.py:1549 - Model=wizardlm2;
litellm-i  | 17:17:10 - LiteLLM:DEBUG: cost_calculator.py:159 - completion_response response ms: 8238.761
litellm-i  | 17:17:10 - LiteLLM:DEBUG: utils.py:1141 - Looking up model=ollama/wizardlm2 in model_cost_map
litellm-i  | 17:17:10 - LiteLLM:DEBUG: utils.py:1141 - Success: model=ollama/wizardlm2 in model_cost_map
litellm-i  | 17:17:10 - LiteLLM:DEBUG: utils.py:1141 - prompt_tokens=10; completion_tokens=225
litellm-i  | 17:17:10 - LiteLLM:DEBUG: utils.py:1141 - Returned custom cost for model=ollama/wizardlm2 - prompt_tokens_cost_usd_dollar: 0.00421, completion_tokens_cost_usd_dollar: 0.11699999999999999
litellm-i  | 17:17:10 - LiteLLM:DEBUG: utils.py:1141 - final cost: 0.12121; prompt_tokens_cost_usd_dollar: 0.00421; completion_tokens_cost_usd_dollar: 0.11699999999999999
litellm-i  | 17:17:10 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:21 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
litellm-i  | 17:17:10 - LiteLLM:DEBUG: caching.py:22 - get cache: cache key: My API Key::2024-06-13-17-17::request_count; local_only: False
litellm-i  | 17:17:10 - LiteLLM:DEBUG: caching.py:22 - get cache: cache result: None
litellm-i  | 17:17:10 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:21 - updated_value in success call: {'current_requests': 0, 'current_tpm': 470, 'current_rpm': 2}, precise_minute: 2024-06-13-17-17
litellm-i  | 17:17:10 - LiteLLM:DEBUG: caching.py:22 - set cache: key: My API Key::2024-06-13-17-17::request_count; value: {'current_requests': 0, 'current_tpm': 470, 'current_rpm': 2}
litellm-i  | 17:17:10 - LiteLLM:DEBUG: caching.py:22 - InMemoryCache: set_cache
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
litellm    |     await self.middleware_stack(scope, receive, send)
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
litellm    |     raise exc
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
litellm    |     await self.app(scope, receive, _send)
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
litellm    |     await self.app(scope, receive, send)
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
litellm    |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
litellm    |     raise exc
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
litellm    |     await app(scope, receive, sender)
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
litellm    |     await self.middleware_stack(scope, receive, send)
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
litellm    |     await route.handle(scope, receive, send)
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
litellm    |     await self.app(scope, receive, send)
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
litellm    |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
litellm    |     raise exc
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
litellm    |     await app(scope, receive, sender)
litellm    |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
litellm    |     response = await func(request)
litellm    |                ^^^^^^^^^^^^^^^^^^^
litellm    |   File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 296, in app
litellm    |     content = await serialize_response(
litellm    |               ^^^^^^^^^^^^^^^^^^^^^^^^^
litellm    |   File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 180, in serialize_response
litellm    |     return jsonable_encoder(response_content)
litellm    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
litellm    |   File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 333, in jsonable_encoder
litellm    |     return jsonable_encoder(
litellm    |            ^^^^^^^^^^^^^^^^^
litellm    |   File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 289, in jsonable_encoder
litellm    |     encoded_value = jsonable_encoder(
litellm    |                     ^^^^^^^^^^^^^^^^^
litellm    |   File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 332, in jsonable_encoder
litellm    |     raise ValueError(errors) from e
litellm    | ValueError: [TypeError("'async_generator' object is not iterable"), TypeError('vars() argument must have __dict__ attribute')]
litellm    | 17:11:25 - LiteLLM:DEBUG: caching.py:22 - async get cache: cache key: daily_metrics_report_sent; local_only: False

Twitter / LinkedIn details

No response

@shuther shuther added the bug Something isn't working label Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant