afeldman-nm
|
543aa48573
|
[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (#4888)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-07-08 17:12:15 +00:00 |
|
Ilya Lavrenov
|
57f09a419c
|
[Hardware][Intel] OpenVINO vLLM backend (#5379)
|
2024-06-28 13:50:16 +00:00 |
|
afeldman-nm
|
f42a006b15
|
[Bugfix]: During testing, use pytest monkeypatch for safely overriding the env var that indicates the vLLM backend (#5210)
|
2024-06-03 20:32:57 -07:00 |
|
Cody Yu
|
ee3eea0a1b
|
[Misc] Take user preference in attention selector (#4960)
|
2024-05-23 07:55:56 +09:00 |
|