feat: skip invokeFlattenKV_v2_ when fp16 and bf16 with CacheType::kBlock #1683

zhyncs · 2024-05-29T10:45:50Z

Motivation and Modification

as titled

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

lzhangzz · 2024-06-04T06:22:03Z

How about BF16, it should be the same as FP16.

zhyncs · 2024-06-04T06:39:50Z

How about BF16, it should be the same as FP16.

yep. I'll land the code soon.

zhyncs · 2024-06-04T08:42:34Z

Verified throughput and correctness on Llama2 13b Chat, consistent with the base.

lzhangzz · 2024-06-05T06:14:31Z

9% performance drop estimated for prefilling approx 200k tokens with Llama3-8B.

this PR	v0.4.2
69520.48	63873.17
69465.04	63672.55
69441.23	63659.92
69397.99	63625.35
69396.67	63574.90

zhyncs · 2024-06-05T11:22:41Z

9% performance drop estimated for prefilling approx 200k tokens with Llama3-8B.

this PR v0.4.2
69520.48 63873.17
69465.04 63672.55
69441.23 63659.92
69397.99 63625.35
69396.67 63574.90

Ok I'll run a detailed timeline analysis later with Llama3-8B. Do you have any suggestions, such as making this feature configurable.

feat: skip invokeFlattenKV_v2_ when fp16 and bf16 with CacheType::kBlock

b779fed

zhyncs force-pushed the skip branch from 9d117af to b779fed Compare June 4, 2024 06:54

zhyncs changed the title ~~feat: skip invokeFlattenKV_v2_ when fp16 with CacheType::kBlock~~ feat: skip invokeFlattenKV_v2_ when fp16 and bf16 with CacheType::kBlock Jun 4, 2024

lvhan028 requested a review from lzhangzz June 4, 2024 06:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: skip invokeFlattenKV_v2_ when fp16 and bf16 with CacheType::kBlock #1683

feat: skip invokeFlattenKV_v2_ when fp16 and bf16 with CacheType::kBlock #1683

zhyncs commented May 29, 2024

lzhangzz commented Jun 4, 2024

zhyncs commented Jun 4, 2024

zhyncs commented Jun 4, 2024

lzhangzz commented Jun 5, 2024 •

edited

Loading

zhyncs commented Jun 5, 2024

feat: skip invokeFlattenKV_v2_ when fp16 and bf16 with CacheType::kBlock #1683

Are you sure you want to change the base?

feat: skip invokeFlattenKV_v2_ when fp16 and bf16 with CacheType::kBlock #1683

Conversation

zhyncs commented May 29, 2024

Motivation and Modification

Use cases (Optional)

Checklist

lzhangzz commented Jun 4, 2024

zhyncs commented Jun 4, 2024

zhyncs commented Jun 4, 2024

lzhangzz commented Jun 5, 2024 • edited Loading

zhyncs commented Jun 5, 2024

lzhangzz commented Jun 5, 2024 •

edited

Loading