You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
litellm in a docker compose as a proxy. working fine but litellm --test returns an internal error
Likely ollama can't handle the number of requests but the error message is strange?
connection to gpt-3.5 -> ollama/wizardlm2 is working fine as per my test
relevant piece in config .yaml
litellm_settings:
drop_params: True
num_retries: 3
request_timeout: 50 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout
allowed_fails: 3 # cooldown model if it fails > 1 call in a minute.
# Switch off Debug Logging, ensure your logs do not have any debugging on
set_verbose: True
cache: False # set cache responses to True, litellm defaults to using a redis cache
cache_params: # set cache params for redis
type: redis
namespace: "litellm_caching"
json_logs: False
# Enable Traceloop
success_callback: ["traceloop"]
success_callback: ["prometheus"]
failure_callback: ["prometheus"]
service_callback: ["prometheus_system"]
Relevant log output
log returned by litellm --test, using: docker compose exec litellm litellm --test
...
LiteLLM: streaming response from proxy ChatCompletionChunk(id='chatcmpl-ab2b950a-2531-48d7-8ef2-64d10bd1d7aa', choices=[Choice(delta=ChoiceDelta(content='.', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1718298799, model='wizardlm2', object='chat.completion.chunk', system_fingerprint=None, usage=None)
LiteLLM: streaming response from proxy ChatCompletionChunk(id='chatcmpl-ab2b950a-2531-48d7-8ef2-64d10bd1d7aa', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1718298799, model='wizardlm2', object='chat.completion.chunk', system_fingerprint=None, usage=None)
making completion request to proxy
Traceback (most recent call last):
File "/usr/local/bin/litellm", line 8, in<module>sys.exit(run_server())
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/litellm/proxy/proxy_cli.py", line 375, in run_server
response = client.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 277, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/openai/resources/completions.py", line 528, in create
return self._post(
^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1240, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 921, in request
return self._request(
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1005, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1053, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1005, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1053, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1020, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Internal Server Error
other logs through docker compose logs:
litellm-i | ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
litellm-i | Final returned optional params: {'num_predict': 256}
litellm-i | self.optional_params: {'num_predict': 256}
litellm-i | RAW RESPONSE:
litellm-i |<bound method ClientResponse.text of <ClientResponse(http://linuxmain.local:11434/api/generate) [200 OK]>
litellm-i |<CIMultiDictProxy('Content-Type': 'application/json; charset=utf-8', 'Date': 'Thu, 13 Jun 2024 17:17:10 GMT', 'Transfer-Encoding': 'chunked')>
litellm-i |>
litellm-i |
litellm-i |
litellm-i |
litellm-i |
litellm-i | Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.utils.Logging object at 0x7fb114484490>>
litellm-i | Logging Details LiteLLM-Success Call: None
litellm-i | Looking up model=ollama/wizardlm2 in model_cost_map
litellm-i | Success: model=ollama/wizardlm2 in model_cost_map
litellm-i | prompt_tokens=10; completion_tokens=225
litellm-i | Returned custom cost for model=ollama/wizardlm2 - prompt_tokens_cost_usd_dollar: 0.00421, completion_tokens_cost_usd_dollar: 0.11699999999999999
litellm-i | final cost: 0.12121; prompt_tokens_cost_usd_dollar: 0.00421; completion_tokens_cost_usd_dollar: 0.11699999999999999
litellm-i | 17:17:10 - LiteLLM Router:INFO: router.py:684 - litellm.acompletion(model=ollama/wizardlm2) 200 OK
litellm-i | 17:17:10 - LiteLLM Router:DEBUG: router.py:2008 - Async Response: ModelResponse(id='chatcmpl-d4372c4c-3389-46b3-87dd-6a5395e269dd', choices=[Choices(finish_reason='stop', index=0, message=Message(content="In the quiet hush of dawn's embrace,\n\nWhere dewdrops cling to leaves with grace,\nThe world awakens in a gentle sway,\nBeneath the canvas of a newborn day.\n\nA symphony of birds takes flight,\nComposing melodies of pure delight,\nAs the sun ascends with a golden light,\nChasing shadows into the night.\n\nPetals unfurl to greet the rays,\nWhispering secrets in soft array,\nNature's artistry on full display,\nIn this moment, time stops to play.\n\nMankind stirs, with dreams anew,\nIn the tapestry of life, we all partake of views,\nWhere every heart beats true,\nAnd in our souls, hope forever broke through.\n\nSo let this test request serve its purpose well,\nA reminder that even words can cast a spell,\nFor poetry is the soul's sweet knell,\nResonating with beauty in each telltale.", role='assistant'))], created=1718299030, model='ollama/wizardlm2', object='chat.completion', system_fingerprint=None, usage=Usage(prompt_tokens=10, completion_tokens=225, total_tokens=235))
litellm-i | 17:17:10 - LiteLLM:DEBUG: utils.py:1141 - Logging Details LiteLLM-Async Success Call
litellm-i | 17:17:10 - LiteLLM:DEBUG: utils.py:1549 - Model=wizardlm2;
litellm-i | 17:17:10 - LiteLLM:DEBUG: cost_calculator.py:159 - completion_response response ms: 8238.761
litellm-i | 17:17:10 - LiteLLM:DEBUG: utils.py:1141 - Looking up model=ollama/wizardlm2 in model_cost_map
litellm-i | 17:17:10 - LiteLLM:DEBUG: utils.py:1141 - Success: model=ollama/wizardlm2 in model_cost_map
litellm-i | 17:17:10 - LiteLLM:DEBUG: utils.py:1141 - prompt_tokens=10; completion_tokens=225
litellm-i | 17:17:10 - LiteLLM:DEBUG: utils.py:1141 - Returned custom cost for model=ollama/wizardlm2 - prompt_tokens_cost_usd_dollar: 0.00421, completion_tokens_cost_usd_dollar: 0.11699999999999999
litellm-i | 17:17:10 - LiteLLM:DEBUG: utils.py:1141 - final cost: 0.12121; prompt_tokens_cost_usd_dollar: 0.00421; completion_tokens_cost_usd_dollar: 0.11699999999999999
litellm-i | 17:17:10 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:21 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
litellm-i | 17:17:10 - LiteLLM:DEBUG: caching.py:22 - get cache: cache key: My API Key::2024-06-13-17-17::request_count; local_only: False
litellm-i | 17:17:10 - LiteLLM:DEBUG: caching.py:22 - get cache: cache result: None
litellm-i | 17:17:10 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:21 - updated_value in success call: {'current_requests': 0, 'current_tpm': 470, 'current_rpm': 2}, precise_minute: 2024-06-13-17-17
litellm-i | 17:17:10 - LiteLLM:DEBUG: caching.py:22 - set cache: key: My API Key::2024-06-13-17-17::request_count; value: {'current_requests': 0, 'current_tpm': 470, 'current_rpm': 2}
litellm-i | 17:17:10 - LiteLLM:DEBUG: caching.py:22 - InMemoryCache: set_cache
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
litellm | await self.middleware_stack(scope, receive, send)
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
litellm | raise exc
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
litellm | await self.app(scope, receive, _send)
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
litellm | await self.app(scope, receive, send)
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
litellm | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
litellm | raise exc
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
litellm | await app(scope, receive, sender)
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
litellm | await self.middleware_stack(scope, receive, send)
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
litellm | await route.handle(scope, receive, send)
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
litellm | await self.app(scope, receive, send)
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
litellm | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
litellm | raise exc
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
litellm | await app(scope, receive, sender)
litellm | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
litellm | response = await func(request)
litellm | ^^^^^^^^^^^^^^^^^^^
litellm | File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 296, in app
litellm | content = await serialize_response(
litellm | ^^^^^^^^^^^^^^^^^^^^^^^^^
litellm | File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 180, in serialize_response
litellm |return jsonable_encoder(response_content)
litellm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
litellm | File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 333, in jsonable_encoder
litellm |return jsonable_encoder(
litellm | ^^^^^^^^^^^^^^^^^
litellm | File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 289, in jsonable_encoder
litellm | encoded_value = jsonable_encoder(
litellm | ^^^^^^^^^^^^^^^^^
litellm | File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 332, in jsonable_encoder
litellm | raise ValueError(errors) from e
litellm | ValueError: [TypeError("'async_generator' object is not iterable"), TypeError('vars() argument must have __dict__ attribute')]litellm | 17:11:25 - LiteLLM:DEBUG: caching.py:22 - async get cache: cache key: daily_metrics_report_sent; local_only: False
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered:
What happened?
litellm in a docker compose as a proxy. working fine but
litellm --test
returns an internal errorLikely ollama can't handle the number of requests but the error message is strange?
connection to gpt-3.5 -> ollama/wizardlm2 is working fine as per my test
relevant piece in config .yaml
Relevant log output
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered: