Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More accurate time logging for ImageEncoder and fix concurrent image processing corruption #1765

Merged
merged 3 commits into from
Jun 18, 2024

Conversation

irexyc
Copy link
Collaborator

@irexyc irexyc commented Jun 12, 2024

Motivation

#1759

@@ -104,23 +98,22 @@ def forward(self, inputs: List[Image]):
"""Model forward."""
time_start = time.perf_counter()
outputs = self.model.forward(inputs)
if isinstance(outputs[0], torch.Tensor):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need to convert to cpu ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modification here is mainly to solve the #1759

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it slow down the inference performance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test with llava-v1.5-7b and profile_restful_api.py (PR 1662)

Move tensor to cpu doesn't affect the performance. But using asyncio.Task may affects the performance.

# with thread
number of prompt tokens: 248339
number of completion tokens: 240582
token throughput (completion token): 825.260 token/s
token throughput (prompt + completion token): 1677.128 token/s
RPS (request per second): 3.430 req/s
RPM (request per minute): 205.816 req/min

# with asyncio.Task
number of prompt tokens: 248339
number of completion tokens: 240582
token throughput (completion token): 808.719 token/s
token throughput (prompt + completion token): 1643.513 token/s
RPS (request per second): 3.362 req/s
RPM (request per minute): 201.691 req/min

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If gpu utility is high, you can add

        await asyncio.get_event_loop().run_in_executor(None,
                                                       self.stream.synchronize)

after forward so other coroutine can use cpu during forward.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my test, wrapping forward is better to synchronize the stream.

outputs = await asyncio.get_event_loop().run_in_executor(
                None, self.forward, inputs)
# token throughput (completion token): 826.370 token/s
outputs = self.model.forward(inputs)
if isinstance(outputs[0], torch.Tensor):
    stream = torch.cuda.current_stream(outputs[0].device)
    await asyncio.get_event_loop().run_in_executor(
                None, stream.synchronize)
# token throughput (completion token): 815.130 token/s

Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvhan028
Copy link
Collaborator

@irexyc Does this PR ONLY target #1759?

@irexyc
Copy link
Collaborator Author

irexyc commented Jun 18, 2024

Does this PR ONLY target #1759?

and #1730

@lvhan028 lvhan028 changed the title More accurate time logging for ImageEncoder More accurate time logging for ImageEncoder and fix concurrent image processing corruption Jun 18, 2024
@lvhan028 lvhan028 merged commit 812fb15 into InternLM:main Jun 18, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants