6 Commits

Author SHA1 Message Date
Roger Wang
1bedf210e3
Bump transformers version for Llama 3.1 hotfix and patch Chameleon (#6690) 2024-07-23 13:47:48 -07:00
sasha0552
dcbf4286af
[Frontend] Customizable RoPE theta (#5197) 2024-06-11 10:42:26 -07:00
Zhuohan Li
1102bef219
[Bugfix / Core] Prefix Caching Guards (merged with main) (#4846)
Co-authored-by: rsnm2 <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-05-27 15:18:17 -07:00
sasha0552
9b9a10d6cb
[Frontend] Dynamic RoPE scaling (#4638) 2024-05-22 01:32:35 -04:00
Antoni Baum
69e1d2fb69
[Core] Refactor model loading code (#4097) 2024-04-16 11:34:39 -07:00
陈序
54be8a0be2
Fix assertion failure in Qwen 1.5 with prefix caching enabled (#3373)
Co-authored-by: Cade Daniel <edacih@gmail.com>
2024-03-14 13:56:57 -07:00