Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] GPT-4o的流式传输,webui显示速度跟不上传输速度 #2594

Open
hlc1209 opened this issue May 22, 2024 · 40 comments
Open

[Bug] GPT-4o的流式传输,webui显示速度跟不上传输速度 #2594

hlc1209 opened this issue May 22, 2024 · 40 comments
Labels
🐛 Bug Something isn't working | 缺陷

Comments

@hlc1209
Copy link

hlc1209 commented May 22, 2024

💻 系统环境

Windows

📦 部署环境

Docker

🌐 浏览器

Chrome

🐛 问题描述

GPT-4o的流式传输,显示速度更不上传输速度。
GPT-4o的输出速度奇快
ui一开始的流式传输显示速度是较慢的,几秒后ui显示速度会突然变快,估计是api那里已经传输完毕了,但是显示还没显示完全。
webui显示速度的设置值是一个预设值吧?而非实时采样的值

🚦 期望结果

提高一下预设值吧。
需要提高两种预设值
1是GPT-4o的值
2是最大速度值。(这个值其实也影响了其他场景,比如我切换了tab,然后切回来,显示的回答是未更新的,但API早就传输完了,这个场景下,现在的最大值远远不够)

当然,最好的解决方案是实时采样,做在后端。

📷 复现步骤

使用GPT-4o API回答任意问题。对长回答更加明显。

📝 补充信息

我是直连的。没走代理

@hlc1209 hlc1209 added the 🐛 Bug Something isn't working | 缺陷 label May 22, 2024
@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


💻 System environment

Windows

📦 Deployment environment

Docker

🌐 Browser

Chrome

🐛 Problem description

For GPT-4o streaming, the display speed is not as fast as the transmission speed.
The output speed of GPT-4o is extremely fast
The streaming display speed of UI is slow at the beginning. After a few seconds, the UI display speed will suddenly become faster. It is estimated that the API has completed the transmission, but the display has not yet been completed.
Is the webui display speed setting value a preset value? rather than real-time sampled values

🚦 Expected results

Increase the default value.
Two preset values ​​need to be increased
1 is the value of GPT-4o
2 is the maximum speed value. (This value actually affects other scenarios. For example, if I switch tabs and then switch back, the displayed answer is not updated, but the API has already been transmitted. In this scenario, the current maximum value is far from enough.)

Of course, the best solution is real-time sampling, done on the backend.

📷 Steps to reproduce

Answer any question using the GPT-4o API. Even more obvious for long answers.

📝 Supplementary information

I'm directly connected. Without proxy

@lobehubbot
Copy link
Member

👀 @hlc1209

Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible.
Please make sure you have given us as much context as possible.
非常感谢您提交 issue。我们会尽快调查此事,并尽快回复您。 请确保您已经提供了尽可能多的背景信息。

@sxjeru
Copy link
Contributor

sxjeru commented May 22, 2024

#945 | #1197

考虑大模型的实时流式输出可能有很强的割裂感,所以做了缓存平滑输出。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


#945 | #1197

Considering that the real-time streaming output of a large model may have a strong sense of fragmentation, we made a cached smooth output.

@hlc1209
Copy link
Author

hlc1209 commented May 22, 2024

#945 | #1197

考虑大模型的实时流式输出可能有很强的割裂感,所以做了缓存平滑输出。

感谢回复。
我认为今后大模型的输出速度只会越来越快。希望能对此加以改良,或者设置单独选项
毕竟等待实在是难受啊,明知道所有输出都已完成,还要看他慢慢往外蹦字。
对于长输出,往往需要额外等待15s甚至更多

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


#945 | #1197

Considering that the real-time streaming output of a large model may have a strong sense of fragmentation, we made a cached smooth output.

Thanks Reply.
I think the output speed of large models will only get faster and faster in the future. I hope this can be improved or a separate option can be set
After all, waiting is really uncomfortable. You know that all the output has been completed, but you still have to watch him slowly pop out the words.
For long output, you often need to wait an additional 15 seconds or more.

@arvinxx
Copy link
Contributor

arvinxx commented May 22, 2024

@hlc1209 输出速率过大也不一定是好事… 还有用户提输出太快导致画面抖动的问题: #2534

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@hlc1209 Excessive output rate is not necessarily a good thing... Some users have also raised the issue of screen jitter caused by output too fast: #2534

@sxjeru
Copy link
Contributor

sxjeru commented May 22, 2024

不知是否尝试过当 api 生成完毕后一次性输出剩余内容,并且不自动下滚,体验如何。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I wonder if you have tried to output the remaining content at once after the api is generated without automatically scrolling down. What is your experience?

@hlc1209
Copy link
Author

hlc1209 commented May 22, 2024

@hlc1209 输出速率过大也不一定是好事… 还有用户提输出太快导致画面抖动的问题: #2534

你提的这个案例其实就是很好的说明了我的问题。
一开始慢就是因为设置了缓存+过小的显示速度。最后突然一大波输出是API已经输出完成,然后UI开始加速输出。

最好的方案当然是类似于openai app那样的实时流式。Edit: 不一定是实时流式,更可能是精确估计
次之是为gpt-4o设置一个更高的缓存输出速度
一个最简单的方案就是避免自动下滚。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@hlc1209 Excessive output rate is not necessarily a good thing... Some users also raised the problem of image jitter caused by output too fast: #2534

The case you mentioned actually illustrates my problem very well.
It was slow at first because of the cache setting + too small display speed. Finally, suddenly a large wave of output is that the API has completed output, and then the UI starts to accelerate the output.

The best solution is of course real-time streaming similar to openai app.
The second is to set a higher cache output speed for gpt-4o
One of the simplest solutions is to avoid automatic scrolling.

@hlc1209
Copy link
Author

hlc1209 commented May 22, 2024

不知是否尝试过当 api 生成完毕后一次性输出剩余内容,并且不自动下滚,体验如何。

这对#2534 来说是一个非常好的解决方案。
但是并没有解决GPT-4o长输出的漫长等待时间的问题

Edited

这个办法是一个比较好的解决措施。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I wonder if you have tried to output the remaining content at once after the api is generated without automatically scrolling down. What is your experience?

This is a very good solution to #2534.
But it does not solve the problem of long waiting time for long output of GPT-4o

@arvinxx
Copy link
Contributor

arvinxx commented May 22, 2024

@hlc1209 能不能录个屏看下 ChatGPT 官方 app 的 gpt-4o 的输出效果?这块我之前是有在想有没有可能实现自动的动态速率调节。之前的这个 PR #1197 其实提到的问题是针对速度很慢的 provider最好也能做到 smooth 。

等 1.0 发布后我是想研究下有没有可能优化下这块的实现来着

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@hlc1209 Can you record a screen to see the output of openai’s gpt-4o? I was previously thinking about whether it was possible to achieve automatic dynamic rate adjustment. The previous PR #1197 actually mentioned the problem that it is best to achieve smooth for slow providers.

After 1.0 is released, I want to study whether it is possible to optimize the implementation of this area.

@sxjeru
Copy link
Contributor

sxjeru commented May 22, 2024

如果能让用户调节api生成期间的平滑输出速率,也是不错的。主要考虑窄屏设备的浏览体验。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


It would also be nice if the user could adjust the smooth output rate during API generation. Mainly consider the browsing experience of narrow screen devices.

@arvinxx
Copy link
Contributor

arvinxx commented May 22, 2024

@sxjeru 用户调节就算了,这个太微操了。还是考虑下怎么做自动化比较好

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@sxjeru Forget about user adjustment, this is too micro-managed. It would be better to consider how to automate it

@hlc1209
Copy link
Author

hlc1209 commented May 22, 2024

@hlc1209 能不能录个屏看下 ChatGPT 官方 app 的 gpt-4o 的输出效果?这块我之前是有在想有没有可能实现自动的动态速率调节。之前的这个 PR #1197 其实提到的问题是针对速度很慢的 provider最好也能做到 smooth 。

等 1.0 发布后我是想研究下有没有可能优化下这块的实现来着

我比较懒hahaha,但我可以描述一下。
其实官方app也并非完美,我个人猜测,官方能掌握到当前输出的速度(精确到每个用户连接到的特定的server的负载),所以往往能做到比较好的速度的估计。但是,我也能经常遇到最后突然以极快速度蹦出一大段话的情况。

综上,我认为官方app也是有显示缓存的,只是其有内部数据,在多数情况下能有一个比较精确的估计。
所以我觉得最后突然蹦出来一大段也是可以接受的,即设置一个极快的显示速度。
再加上不自动滚动。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@hlc1209 Can you record a screen to see the output effect of gpt-4o of ChatGPT official app? I was previously thinking about whether it was possible to achieve automatic dynamic rate adjustment. The problem mentioned in the previous PR #1197 is that it is best to achieve smooth for slow providers.

After 1.0 is released, I want to study whether it is possible to optimize the implementation of this area.

I'm lazy hahaha, but I can describe it.
In fact, the official app is not perfect. I personally guess that the official app can grasp the current output speed (accurate to the load of the specific server that each user is connected to), so it can often make better speed estimates. **However, I often encounter situations where a large paragraph suddenly pops up at a very fast speed at the end. **

To sum up, I think the official app also has a display cache, but it has internal data, which can give a more accurate estimate in most cases.
So I think it is acceptable to suddenly pop up a large paragraph at the end, that is, to set an extremely fast display speed.
Plus no auto-scrolling.

@arvinxx
Copy link
Contributor

arvinxx commented May 22, 2024

但是,我也能经常遇到最后突然以极快速度蹦出一大段话的情况。

我最早用 ChatGPT App 的时候也会遇到这个,所以其实是借鉴了这个实现逻辑。他们做的体验好一些的本质原因还是你提到的他们知道每秒输出的token有多少。这个对于我们来说基本上没法设定一个固定值。(毕竟我们提供了 Proxy Url,我们也不知道三方服务商是否会给 4o 再套个壳这种…)

所以终极解法应该还是基于 SSE 间隔和吐字速度算出一个 TPS ,然后实现动态的速率调节。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


However, I often encounter situations where a large paragraph suddenly bursts out at a very fast speed at the end.

I also encountered this when I first used ChatGPT App, so I actually borrowed this implementation logic. The essential reason why their experience is better is that they know how many tokens are output per second as you mentioned. This is basically impossible for us to set a fixed value. (After all, we provide Proxy Url, and we don’t know whether third-party service providers will add another shell to 4o...)

Therefore, the ultimate solution should be to calculate a TPS based on the SSE interval and articulation speed, and then implement dynamic rate adjustment.

@hlc1209
Copy link
Author

hlc1209 commented May 22, 2024

不过说实话#1197 提到的问题我以前GPT-4初代(仍是目前规模最大的模型,且速度很慢)刚发布的时候经常在官方app上遇到。
个人认为不需要处理😂

综上,其实在1.0版本前,最美观优雅的修复方法就是

  1. 给用户选项,是否实时流式传输。并且对openai,anthropic等公司的成熟商业产品默认启用实时流式传输。选项就加在"设置-语言模型-不同providers"中就行
  2. API传输完毕后,设置一个极大的输出速度,一次性输出剩下字符
  3. 检测到用户滚动后,关闭自动滚动

1.0之后,最好的方法正如你所说,动态速率调节。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


But to be honest, I often asked about the issues mentioned in #1197 when the first generation of GPT-4 (still the largest model at present, and very slow) was first released. Found it on the official app.
Personally I don’t think it needs to be dealt with 😂

To sum up, in fact, before version 1.0, the most beautiful and elegant repair method was

  1. Give users the option to stream in real time. And real-time streaming is enabled by default for mature commercial products from openai, anthropopic and other companies. Just add the option in "Settings-Language Model-Different Providers"
  2. After the API transmission is completed, set a maximum output speed and output the remaining characters at once
  3. After detecting user scrolling, turn off automatic scrolling

After 1.0, the best way is as you said, dynamic rate adjustment.

@arvinxx
Copy link
Contributor

arvinxx commented May 22, 2024

检测到用户滚动后,关闭自动滚动

这个我感觉可以加,是目前成本比较低,体验估计也还行的方案

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Turn off autoscroll when user scrolling is detected

I think this can be added. It is currently a relatively low-cost solution and the experience is probably good.

@hlc1209
Copy link
Author

hlc1209 commented May 22, 2024

检测到用户滚动后,关闭自动滚动

这个我感觉可以加,是目前成本比较低,体验估计也还行的方案

别忘了增大API传输完成后的最大输出速度啊😂修改成本也很低
我此时此刻正在被无数个15s的额外输出等待时间折磨。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


After detecting user scrolling, turn off automatic scrolling

I think this can be added. It is currently a relatively low-cost solution and the experience is probably good.

Don’t forget to increase the maximum output speed after the API transfer is completed😂
At this moment, I am being tortured by countless 15s of additional output waiting time.

@hlc1209
Copy link
Author

hlc1209 commented May 22, 2024

2是最大速度值。(这个值其实也影响了其他场景,比如我切换了tab,然后切回来,显示的回答是未更新的,但API早就传输完了,这个场景下,现在的最大值远远不够)

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


2 is the maximum speed value. (This value actually affects other scenarios. For example, if I switch tabs and then switch back, the displayed answer is not updated, but the API has already been transmitted. In this scenario, the current maximum value is far from enough.)

@sxjeru
Copy link
Contributor

sxjeru commented May 22, 2024

检测到用户滚动后,关闭自动滚动

这个我感觉可以加,是目前成本比较低,体验估计也还行的方案

这个特性 #2223 有实现,是否是需要暂时关闭自动滚动,直到本次消息生成完毕。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


After detecting user scrolling, turn off automatic scrolling

I think this can be added. It is currently a relatively low-cost solution and the experience is probably good.

This feature #2223 has been implemented. Is it necessary to temporarily turn off automatic scrolling until this message is generated?

@arvinxx
Copy link
Contributor

arvinxx commented May 23, 2024

@sxjeru 我建议是当接口完成输出后,临时停止自动滚动,这样比较好一些

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@sxjeru My suggestion is to temporarily stop automatic scrolling after the interface completes output. This is better.

@BrandonStudio
Copy link
Contributor

我记得之前用 Anthropic Claude 的时候也有这个问题
即使点了停止,它也会加速地把没显示出来的回复显示出来

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I remember I had this problem when I used Anthropic Claude before.
Even if you click stop, it will speed up the display of replies that have not been displayed.

@mxdlzg
Copy link

mxdlzg commented Jun 1, 2024

是不是可以看本地收到的流message堆积量来动态调节,如果堆积大于阈值可以按倍率或者指数提升显示速度?

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Is it possible to dynamically adjust the accumulated flow messages received locally? If the accumulation is greater than the threshold, the display speed can be increased by multiples or exponentially?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 Bug Something isn't working | 缺陷
Projects
Status: Roadmap - Chat 1.x
Development

No branches or pull requests

6 participants