{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":634081686,"defaultBranch":"main","name":"mlc-llm","ownerLogin":"mlc-ai","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2023-04-29T01:59:25.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/106173866?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1719326437.0","currentOid":""},"activityList":{"items":[{"before":"d216c0121ff6c2355d88279927686d1c16e60402","after":"a7b1022e0b02fa261b13df08fa84eccd4d7eb072","ref":"refs/heads/gh-pages","pushedAt":"2024-06-25T21:24:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Tue Jun 25 21:24:42 UTC 2024","shortMessageHtmlLink":"Build at Tue Jun 25 21:24:42 UTC 2024"}},{"before":"437166a4db76355175fa5847551d6f302f19a974","after":"6a48a02eb9a96988dfc22bbe0bd95dd0305be1dc","ref":"refs/heads/main","pushedAt":"2024-06-25T21:17:11.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"MasterJH5574","name":"Ruihang Lai","path":"/MasterJH5574","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/45167100?s=80&v=4"},"commit":{"message":"[Serving] Hybrid prefill (#2604)\n\nThis PR adds the support for the hybrid prefill. So during the prefill\r\nengine action, it will do the decode for running requests as well.","shortMessageHtmlLink":"[Serving] Hybrid prefill (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2372502075\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2604\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2604/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2604\">#2604</a>)"}},{"before":null,"after":"59a3934cee617db054a29e2ed85110bbcbe409f7","ref":"refs/heads/internlm2-loader","pushedAt":"2024-06-25T14:40:37.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"MasterJH5574","name":"Ruihang Lai","path":"/MasterJH5574","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/45167100?s=80&v=4"},"commit":{"message":"internlm2 loader","shortMessageHtmlLink":"internlm2 loader"}},{"before":"5ac1832710d67e014f2c5677149371178a23d990","after":"d216c0121ff6c2355d88279927686d1c16e60402","ref":"refs/heads/gh-pages","pushedAt":"2024-06-19T18:58:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Wed Jun 19 18:58:26 UTC 2024","shortMessageHtmlLink":"Build at Wed Jun 19 18:58:26 UTC 2024"}},{"before":"e9340c36693a2ccd842d30e944094f23ae7b91f7","after":"437166a4db76355175fa5847551d6f302f19a974","ref":"refs/heads/main","pushedAt":"2024-06-19T18:51:15.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tqchen","name":"Tianqi Chen","path":"/tqchen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2577440?s=80&v=4"},"commit":{"message":"[Model] Gemma 1.1 compatibility (#2594)\n\nThis PR updates the Gemma config so that MLC can work properly with\r\nGemma 1.1.","shortMessageHtmlLink":"[Model] Gemma 1.1 compatibility (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2360994994\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2594\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2594/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2594\">#2594</a>)"}},{"before":"47dcbe9eabe734b1a76dd9801d7a5377fa1c95f4","after":"5ac1832710d67e014f2c5677149371178a23d990","ref":"refs/heads/gh-pages","pushedAt":"2024-06-17T12:22:41.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Mon Jun 17 12:22:41 UTC 2024","shortMessageHtmlLink":"Build at Mon Jun 17 12:22:41 UTC 2024"}},{"before":"75b970b4f5c2729b6e05f655f29d5133a1c03a02","after":"e9340c36693a2ccd842d30e944094f23ae7b91f7","ref":"refs/heads/main","pushedAt":"2024-06-17T12:15:54.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tqchen","name":"Tianqi Chen","path":"/tqchen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2577440?s=80&v=4"},"commit":{"message":"[Op] Top-4 implementation for MoE model (#2586)\n\nThis PR introduces a top-4 kernel for MoE model (particularly for\r\nthe Qwen-MoE) at this moment.\r\n\r\nThis is still a manual implementation and has some duplication\r\nwith the existing top-2 kernel. In the future we'll consider leveraging\r\nmeta-programming of TIR to unify the top-k kernel implementations.","shortMessageHtmlLink":"[Op] Top-4 implementation for MoE model (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2356092444\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2586\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2586/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2586\">#2586</a>)"}},{"before":"eb3819c688cb56599b8bbd4dc862286ed0cd74dc","after":"47dcbe9eabe734b1a76dd9801d7a5377fa1c95f4","ref":"refs/heads/gh-pages","pushedAt":"2024-06-14T02:30:29.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Fri Jun 14 02:30:29 UTC 2024","shortMessageHtmlLink":"Build at Fri Jun 14 02:30:29 UTC 2024"}},{"before":"ceba9511df3da06a8541916522d57fdc99cb6f54","after":"75b970b4f5c2729b6e05f655f29d5133a1c03a02","ref":"refs/heads/main","pushedAt":"2024-06-14T02:22:54.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"CharlieFRuan","name":"Charlie Ruan","path":"/CharlieFRuan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/53290280?s=80&v=4"},"commit":{"message":"[Doc] Update WebLLM doc (#2578)\n\nUpdate documentation for WebLLM. Currently we only provide a high-level view for WebLLM runtime here, and refer user to the WebLLM repo README for more. The documentation focuses on adding their own model variant / model library for WebLLM. Will follow up with more thorough runtime documentation.","shortMessageHtmlLink":"[Doc] Update WebLLM doc (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2352200039\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2578\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2578/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2578\">#2578</a>)"}},{"before":"1415c847cdca5b1b03cb1123b2d164c5909e8922","after":"eb3819c688cb56599b8bbd4dc862286ed0cd74dc","ref":"refs/heads/gh-pages","pushedAt":"2024-06-13T11:37:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Thu Jun 13 11:37:12 UTC 2024","shortMessageHtmlLink":"Build at Thu Jun 13 11:37:12 UTC 2024"}},{"before":"94a029526b224a577ecec366578476ebdc05fbd4","after":"ceba9511df3da06a8541916522d57fdc99cb6f54","ref":"refs/heads/main","pushedAt":"2024-06-13T11:30:26.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tqchen","name":"Tianqi Chen","path":"/tqchen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2577440?s=80&v=4"},"commit":{"message":"[Metrics] Add missing fields in `Reset` (#2574)\n\nThis PR adds the missing fields that were not cleared up in\n`EngineMetrics::Reset`.","shortMessageHtmlLink":"[Metrics] Add missing fields in <code>Reset</code> (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2349894176\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2574\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2574/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2574\">#2574</a>)"}},{"before":"e24c394cbe931847ea4b939c54fcc86f1805a9c4","after":"1415c847cdca5b1b03cb1123b2d164c5909e8922","ref":"refs/heads/gh-pages","pushedAt":"2024-06-13T05:36:10.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Thu Jun 13 05:36:10 UTC 2024","shortMessageHtmlLink":"Build at Thu Jun 13 05:36:10 UTC 2024"}},{"before":"07c92b04d8a8ba628a01ea3c02a9c936343a7992","after":"94a029526b224a577ecec366578476ebdc05fbd4","ref":"refs/heads/main","pushedAt":"2024-06-13T05:29:42.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"Hzfengsy","name":"Siyuan Feng","path":"/Hzfengsy","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/25500082?s=80&v=4"},"commit":{"message":"[Model] Support Multi-GPU for Qwen-MoE model (#2573)\n\nThis PR introduces the multi-GPU support for the Qwen-MoE model.\r\nValidated on 4090x2.","shortMessageHtmlLink":"[Model] Support Multi-GPU for Qwen-MoE model (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2349892431\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2573\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2573/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2573\">#2573</a>)"}},{"before":"7f0c019e6579b209dde38f5118a054c7a5557f9f","after":"e24c394cbe931847ea4b939c54fcc86f1805a9c4","ref":"refs/heads/gh-pages","pushedAt":"2024-06-12T21:28:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Wed Jun 12 21:28:13 UTC 2024","shortMessageHtmlLink":"Build at Wed Jun 12 21:28:13 UTC 2024"}},{"before":"dcece515ec9063b3e11c558382d94ff3f6526379","after":"07c92b04d8a8ba628a01ea3c02a9c936343a7992","ref":"refs/heads/main","pushedAt":"2024-06-12T21:21:28.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"cyx-6","name":"Yaxing Cai","path":"/cyx-6","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16838183?s=80&v=4"},"commit":{"message":"[Bench] Json mode bench (#2552)\n\n* [Bench] Json mode bench\r\n\r\nThis PR refactors mlc bench to enable json mode in dataset.\r\n\r\n* upd\r\n\r\n* fix lint","shortMessageHtmlLink":"[Bench] Json mode bench (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2341295861\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2552\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2552/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2552\">#2552</a>)"}},{"before":"7a8502807692d6d5b3dba3b3a1bc26ddf0791963","after":"7f0c019e6579b209dde38f5118a054c7a5557f9f","ref":"refs/heads/gh-pages","pushedAt":"2024-06-12T11:22:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Wed Jun 12 11:22:28 UTC 2024","shortMessageHtmlLink":"Build at Wed Jun 12 11:22:28 UTC 2024"}},{"before":"eca0898d2c2ad6292075d9df069b5e5bd3e36782","after":"7a8502807692d6d5b3dba3b3a1bc26ddf0791963","ref":"refs/heads/gh-pages","pushedAt":"2024-06-12T11:21:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Wed Jun 12 11:21:01 UTC 2024","shortMessageHtmlLink":"Build at Wed Jun 12 11:21:01 UTC 2024"}},{"before":"873827c25ca1f9d09c6eaa671fc9363c5ee135f9","after":"dcece515ec9063b3e11c558382d94ff3f6526379","ref":"refs/heads/main","pushedAt":"2024-06-12T11:14:45.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tqchen","name":"Tianqi Chen","path":"/tqchen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2577440?s=80&v=4"},"commit":{"message":"[Serving] Apply tree structure in draft token verification (#2563)\n\nThis adds the interface to draft token state and sampler to allow tree\nstructure being recorded and used for verification","shortMessageHtmlLink":"[Serving] Apply tree structure in draft token verification (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2345078549\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2563\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2563/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2563\">#2563</a>)"}},{"before":"a231ae1215bd7e06a2f5eddb4e826cd873e69820","after":"873827c25ca1f9d09c6eaa671fc9363c5ee135f9","ref":"refs/heads/main","pushedAt":"2024-06-12T11:14:08.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tqchen","name":"Tianqi Chen","path":"/tqchen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2577440?s=80&v=4"},"commit":{"message":"[Model] Enhance error reporting for invalid tensor-parallel settings (#2566)\n\nThis PR enhances the error reporting for multi-GPU model compilation,\nso we can provide as many error reasons as possible before loading and\nrunning the models.","shortMessageHtmlLink":"[Model] Enhance error reporting for invalid tensor-parallel settings (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2346993736\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2566\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2566/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2566\">#…</a>"}},{"before":"61ca3ae8e6f403f98843d23d6b1369f825849885","after":"eca0898d2c2ad6292075d9df069b5e5bd3e36782","ref":"refs/heads/gh-pages","pushedAt":"2024-06-11T19:41:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Tue Jun 11 19:41:43 UTC 2024","shortMessageHtmlLink":"Build at Tue Jun 11 19:41:43 UTC 2024"}},{"before":"42f146d495862f36144a7bbe9a3e966c513e1e36","after":"a231ae1215bd7e06a2f5eddb4e826cd873e69820","ref":"refs/heads/main","pushedAt":"2024-06-11T19:34:36.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"MasterJH5574","name":"Ruihang Lai","path":"/MasterJH5574","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/45167100?s=80&v=4"},"commit":{"message":"[Delivery] Update model delivery script (#2565)\n\nSome improvements of the delivery script:\r\n\r\n- provide different overrides for different quantization. e.g. we can change\r\nprefill chunk size for q0/q3/q4\r\n- rerun gen config only if only conv_template changes\r\n- do NOT recreate HF repo when the repo already exists. This will preserve\r\ncommit history\r\n- dry-run validation","shortMessageHtmlLink":"[Delivery] Update model delivery script (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2345869633\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2565\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2565/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2565\">#2565</a>)"}},{"before":"00b22dbde57a5a88c01d408abf6656b3512840d6","after":"61ca3ae8e6f403f98843d23d6b1369f825849885","ref":"refs/heads/gh-pages","pushedAt":"2024-06-11T17:11:11.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Tue Jun 11 17:11:11 UTC 2024","shortMessageHtmlLink":"Build at Tue Jun 11 17:11:11 UTC 2024"}},{"before":"4234262761b971c970be3c669bd8c8c41ba1db14","after":"42f146d495862f36144a7bbe9a3e966c513e1e36","ref":"refs/heads/main","pushedAt":"2024-06-11T17:04:35.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tqchen","name":"Tianqi Chen","path":"/tqchen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2577440?s=80&v=4"},"commit":{"message":"[Serving][Grammar] Jump-forward decoding (#2551)\n\n[Serve][Grammar] Jump-forward decoding\r\n\r\nThis PR supports the jump-forward decoding as described in\r\n<https://lmsys.org/blog/2024-02-05-compressed-fsm/>. The jump-forward\r\ndecoding uses the grammar constraint to predict the next output string and\r\ntokenize the string into tokens, and therefore speeds up the decoding.\r\n\r\nThis PR implements these optimizations to ensure the output quality:\r\n- Retokenization in jumpforward: Tokenize the last k token as string appended with the predicted\r\n  string. If the tokenization result differs from the old tokens, roll back\r\n  these tokens and accept the new ones.\r\n- Retokenization in decoding: Tokenize the last k token as string appended with\r\n  the decoded token. This will happen in decoding stage when the jumpforward decoding happens\r\n  in the last round. If the result differs, the old tokens will be rolled back.\r\n- Skip prefix tokens in jumpforward: We call tokens that is a prefix of another token\r\n  as prefix tokens. If the last token from jumpforward is a prefix token, it's highly possible\r\n  that it will be rolled back in the next decode stage, as it may be combined with the\r\n  decoded token. It also effects the output distribution as such pattern is rare in training data.\r\n  Therefore, we skip the last prefix token in jumpforward decoding.\r\n\r\nThis PR also includes the following changes:\r\n- Add several metrics for request and engine, especially about the jumpforward decoding\r\n- Fix a bug in `_async_query_engine_metrics` to avoid throwing CancelledError from early return\r\n\r\nPerformance and benchmark:\r\n\r\nSchema(Pydantic):\r\n```\r\nclass Product(BaseModel):\r\n    product_id: int\r\n    is_available: bool\r\n    price: float\r\n    is_featured: Literal[True]\r\n    category: Literal[\"Electronics\", \"Clothing\", \"Food\"]\r\n    tags: List[str]\r\n    stock: Dict[str, int]\r\n```\r\n\r\nPlatform: AMD Ryzen 9 5900X, NVIDIA 3080 10G\r\n\r\nResults:\r\n```\r\nJump forward: False, Batch: 1\r\nEngine metrics:\r\n{\r\n    \"engine_decode_time_sum\": 0.4988938220000001,\r\n    \"engine_jump_forward_time_sum\": 0,\r\n    \"completion_tokens_sum\": 66,\r\n    \"decode_tokens_sum\": 66,\r\n    \"jump_forward_tokens_sum\": 0,\r\n    \"decode_tokens_per_s\": 132.2926785010378,\r\n}\r\nJump forward: True, Batch: 1\r\nEngine metrics:\r\n{\r\n    \"engine_decode_time_sum\": 0.37242740600000007,\r\n    \"engine_jump_forward_time_sum\": 0.027989265000000006,\r\n    \"completion_tokens_sum\": 68,\r\n    \"decode_tokens_sum\": 68,\r\n    \"jump_forward_tokens_sum\": 28,\r\n    \"decode_tokens_per_s\": 182.58591850246378,\r\n}\r\nJump forward: False, Batch: 4\r\nEngine metrics:\r\n{\r\n    \"engine_decode_time_sum\": 0.9106805410000002,\r\n    \"engine_jump_forward_time_sum\": 0,\r\n    \"completion_tokens_sum\": 261,\r\n    \"decode_tokens_sum\": 261,\r\n    \"jump_forward_tokens_sum\": 0,\r\n    \"decode_tokens_per_s\": 286.5988546470984,\r\n}\r\nJump forward: True, Batch: 4\r\nEngine metrics:\r\n{\r\n    \"engine_decode_time_sum\": 0.6843025599999999,\r\n    \"engine_jump_forward_time_sum\": 0.028089531999999997,\r\n    \"completion_tokens_sum\": 266,\r\n    \"decode_tokens_sum\": 266,\r\n    \"jump_forward_tokens_sum\": 112,\r\n    \"decode_tokens_per_s\": 388.71694415405966,\r\n}\r\nJump forward: False, Batch: 8\r\nEngine metrics:\r\n{\r\n    \"engine_decode_time_sum\": 1.62462493,\r\n    \"engine_jump_forward_time_sum\": 0,\r\n    \"completion_tokens_sum\": 538,\r\n    \"decode_tokens_sum\": 538,\r\n    \"jump_forward_tokens_sum\": 0,\r\n    \"decode_tokens_per_s\": 331.1533573475325,\r\n}\r\nJump forward: True, Batch: 8\r\nEngine metrics:\r\n{\r\n    \"engine_decode_time_sum\": 1.0509048310000002,\r\n    \"engine_jump_forward_time_sum\": 0.027971332000000022,\r\n    \"completion_tokens_sum\": 525,\r\n    \"decode_tokens_sum\": 525,\r\n    \"jump_forward_tokens_sum\": 224,\r\n    \"decode_tokens_per_s\": 499.5694990767436,\r\n}\r\nJump forward: False, Batch: 16\r\nEngine metrics:\r\n{\r\n    \"engine_decode_time_sum\": 2.317279175,\r\n    \"engine_jump_forward_time_sum\": 0,\r\n    \"completion_tokens_sum\": 1068,\r\n    \"decode_tokens_sum\": 1068,\r\n    \"jump_forward_tokens_sum\": 0,\r\n    \"decode_tokens_per_s\": 460.8853398080531,\r\n}\r\nJump forward: True, Batch: 16\r\nEngine metrics:\r\n{\r\n    \"engine_decode_time_sum\": 1.3962938819999997,\r\n    \"engine_jump_forward_time_sum\": 0.030129287999999994,\r\n    \"completion_tokens_sum\": 1059,\r\n    \"decode_tokens_sum\": 1059,\r\n    \"jump_forward_tokens_sum\": 448,\r\n    \"decode_tokens_per_s\": 758.4363246533227,\r\n}\r\n```","shortMessageHtmlLink":"[Serving][Grammar] Jump-forward decoding (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2341114671\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2551\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2551/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2551\">#2551</a>)"}},{"before":"7f4d63d1d92b6696c20b10c334a6aa1a3030c7d3","after":"00b22dbde57a5a88c01d408abf6656b3512840d6","ref":"refs/heads/gh-pages","pushedAt":"2024-06-10T18:28:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Mon Jun 10 18:28:31 UTC 2024","shortMessageHtmlLink":"Build at Mon Jun 10 18:28:31 UTC 2024"}},{"before":"931587ba139ebfa8fd99ba9d908c8b3c8fbfa2dc","after":"4234262761b971c970be3c669bd8c8c41ba1db14","ref":"refs/heads/main","pushedAt":"2024-06-10T18:21:30.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tqchen","name":"Tianqi Chen","path":"/tqchen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2577440?s=80&v=4"},"commit":{"message":"[Tokenizer] Priorize HuggingFace/SentencePiece over ByteLevelBPE (#2559)\n\nThis PR updates the tokenzier load logic, so that we prioritize\r\nthe use of HuggingFace and SentencePiece tokenizers over the\r\nByteLevelBPE tokenizer.\r\n\r\nThis fixes the issue that token `<im_start>` in Qwen model is\r\ntokenized into multiple tokens when the ByteLevelBPE tokenizer\r\nis chosen when available.","shortMessageHtmlLink":"[Tokenizer] Priorize HuggingFace/SentencePiece over ByteLevelBPE (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2342339216\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2559\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2559/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2559\">#2559</a>)"}},{"before":"c25834da3b403c8c89784726a7269123ee5c0d91","after":"931587ba139ebfa8fd99ba9d908c8b3c8fbfa2dc","ref":"refs/heads/main","pushedAt":"2024-06-10T18:21:16.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tqchen","name":"Tianqi Chen","path":"/tqchen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2577440?s=80&v=4"},"commit":{"message":"Fix compilation for gcc 13.2 (#2561)","shortMessageHtmlLink":"Fix compilation for gcc 13.2 (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2343638191\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2561\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2561/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2561\">#2561</a>)"}},{"before":"1e5f3994064cc54483b7e5de92d97490e28a4781","after":"7f4d63d1d92b6696c20b10c334a6aa1a3030c7d3","ref":"refs/heads/gh-pages","pushedAt":"2024-06-09T22:25:30.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Sun Jun  9 22:25:30 UTC 2024","shortMessageHtmlLink":"Build at Sun Jun 9 22:25:30 UTC 2024"}},{"before":"9633c9f6eaa37a41386c5d255efefe60e4cb1a63","after":"c25834da3b403c8c89784726a7269123ee5c0d91","ref":"refs/heads/main","pushedAt":"2024-06-09T22:18:25.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"MasterJH5574","name":"Ruihang Lai","path":"/MasterJH5574","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/45167100?s=80&v=4"},"commit":{"message":"[Docs] Fix typo in mlc_llm chat command (#2560)","shortMessageHtmlLink":"[Docs] Fix typo in mlc_llm chat command (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2342513725\" data-permission-text=\"Title is private\" data-url=\"https://github.com/mlc-ai/mlc-llm/issues/2560\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/mlc-ai/mlc-llm/pull/2560/hovercard\" href=\"https://github.com/mlc-ai/mlc-llm/pull/2560\">#2560</a>)"}},{"before":null,"after":"9a6244fbc54cdc775ec8b891b9779cd7b576daba","ref":"refs/heads/docs_typo_mlc_chat","pushedAt":"2024-06-09T22:08:04.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"Neet-Nestor","name":"Nestor Qin","path":"/Neet-Nestor","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/23090573?s=80&v=4"},"commit":{"message":"[Docs] Fix typo in mlc_llm chat command","shortMessageHtmlLink":"[Docs] Fix typo in mlc_llm chat command"}},{"before":"b22a3c0b12aa4bb58bbc9ae50996b325b7580e64","after":"1e5f3994064cc54483b7e5de92d97490e28a4781","ref":"refs/heads/gh-pages","pushedAt":"2024-06-08T22:46:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Build at Sat Jun  8 22:46:43 UTC 2024","shortMessageHtmlLink":"Build at Sat Jun 8 22:46:43 UTC 2024"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEbxT4wQA","startCursor":null,"endCursor":null}},"title":"Activity · mlc-ai/mlc-llm"}