Yanyi Liu
|
ff7424f491
|
[Frontend] Support override generation config in args (#12409)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
|
2025-01-29 01:41:01 -08:00 |
|
Cyrus Leung
|
8f10d5e393
|
[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 01:28:00 -08:00 |
|
Cyrus Leung
|
133707123e
|
[Model] Replace embedding models with pooling adapter (#10769)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 08:02:54 +08:00 |
|
Cyrus Leung
|
2ac6d0e75b
|
[Misc] Consolidate pooler config overrides (#10351)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-15 06:59:00 +00:00 |
|
sroy745
|
b41fb9d3b1
|
[Encoder Decoder] Update Mllama to run with both FlashAttention and XFormers (#9982)
Signed-off-by: Sourashis Roy <sroy@roblox.com>
|
2024-11-12 10:53:57 -08:00 |
|
Krishna Mandal
|
b09895a618
|
[Frontend][Core] Override HF config.json via CLI (#5836)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-09 16:19:27 +00:00 |
|
Flávia Béo
|
aa9078fa03
|
Adds method to read the pooling types from model's files (#9506)
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-11-07 08:42:40 +00:00 |
|
Cyrus Leung
|
db7db4aab9
|
[Misc] Consolidate ModelConfig code related to HF config (#10104)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-07 06:00:21 +00:00 |
|
Cyrus Leung
|
051eaf6db3
|
[Model] Add user-configurable task for models that support both generation and embedding (#9424)
|
2024-10-18 11:31:58 -07:00 |
|
Cyrus Leung
|
7e7eae338d
|
[Misc] Standardize RoPE handling for Qwen2-VL (#9250)
|
2024-10-16 13:56:17 +08:00 |
|
Michael Goin
|
421e218b37
|
[Bugfix] Bump transformers to 4.43.2 (#6752)
|
2024-07-24 13:22:16 -07:00 |
|
Roger Wang
|
1bedf210e3
|
Bump transformers version for Llama 3.1 hotfix and patch Chameleon (#6690)
|
2024-07-23 13:47:48 -07:00 |
|
sasha0552
|
dcbf4286af
|
[Frontend] Customizable RoPE theta (#5197)
|
2024-06-11 10:42:26 -07:00 |
|
Zhuohan Li
|
1102bef219
|
[Bugfix / Core] Prefix Caching Guards (merged with main) (#4846)
Co-authored-by: rsnm2 <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-05-27 15:18:17 -07:00 |
|
sasha0552
|
9b9a10d6cb
|
[Frontend] Dynamic RoPE scaling (#4638)
|
2024-05-22 01:32:35 -04:00 |
|
Antoni Baum
|
69e1d2fb69
|
[Core] Refactor model loading code (#4097)
|
2024-04-16 11:34:39 -07:00 |
|
陈序
|
54be8a0be2
|
Fix assertion failure in Qwen 1.5 with prefix caching enabled (#3373)
Co-authored-by: Cade Daniel <edacih@gmail.com>
|
2024-03-14 13:56:57 -07:00 |
|