Rui Qiao
|
cbdb252259
|
[Misc] Limit to ray[adag] 2.35 to avoid backward incompatible change (#8509)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-09-17 00:06:26 -07:00 |
|
youkaichao
|
99aa4eddaf
|
[torch.compile] register allreduce operations as custom ops (#8526)
|
2024-09-16 22:57:57 -07:00 |
|
Roger Wang
|
ee2bceaaa6
|
[Misc][Bugfix] Disable guided decoding for mistral tokenizer (#8521)
|
2024-09-16 22:22:45 -07:00 |
|
Alex Brooks
|
1c1bb388e0
|
[Frontend] Improve Nullable kv Arg Parsing (#8525)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-09-17 04:17:32 +00:00 |
|
Simon Mo
|
546034b466
|
[refactor] remove triton based sampler (#8524)
|
2024-09-16 20:04:48 -07:00 |
|
Joe Runde
|
cca61642e0
|
[Bugfix] Fix 3.12 builds on main (#8510)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-09-17 00:01:45 +00:00 |
|
Simon Mo
|
5ce45eb54d
|
[misc] small qol fixes for release process (#8517)
|
2024-09-16 15:11:27 -07:00 |
|
Simon Mo
|
5478c4b41f
|
[perf bench] set timeout to debug hanging (#8516)
|
2024-09-16 14:30:02 -07:00 |
|
Kevin Lin
|
47f5e03b5b
|
[Bugfix] Bind api server port before starting engine (#8491)
|
2024-09-16 13:56:28 -07:00 |
|
youkaichao
|
2759a43a26
|
[doc] update doc on testing and debugging (#8514)
|
2024-09-16 12:10:23 -07:00 |
|
Luka Govedič
|
5d73ae49d6
|
[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (#7270)
|
2024-09-16 11:52:40 -07:00 |
|
sasha0552
|
781e3b9a42
|
[Bugfix][Kernel] Fix build for sm_60 in GGUF kernel (#8506)
|
2024-09-16 12:15:57 -06:00 |
|
Nick Hill
|
acd5511b6d
|
[BugFix] Fix clean shutdown issues (#8492)
|
2024-09-16 09:33:46 -07:00 |
|
lewtun
|
837c1968f9
|
[Frontend] Expose revision arg in OpenAI server (#8501)
|
2024-09-16 15:55:26 +00:00 |
|
ElizaWszola
|
a091e2da3e
|
[Kernel] Enable 8-bit weights in Fused Marlin MoE (#8032)
Co-authored-by: Dipika <dipikasikka1@gmail.com>
|
2024-09-16 09:47:19 -06:00 |
|
Isotr0py
|
fc990f9795
|
[Bugfix][Kernel] Add IQ1_M quantization implementation to GGUF kernel (#8357)
|
2024-09-15 16:51:44 -06:00 |
|
Chris
|
3724d5f6b5
|
[Bugfix][Model] Fix Python 3.8 compatibility in Pixtral model by updating type annotations (#8490)
|
2024-09-15 04:20:05 +00:00 |
|
Woosuk Kwon
|
50e9ec41fc
|
[TPU] Implement multi-step scheduling (#8489)
|
2024-09-14 16:58:31 -07:00 |
|
youkaichao
|
47790f3e32
|
[torch.compile] add a flag to disable custom op (#8488)
|
2024-09-14 13:07:16 -07:00 |
|
youkaichao
|
a36e070dad
|
[torch.compile] fix functionalization (#8480)
|
2024-09-14 09:46:04 -07:00 |
|
ywfang
|
8a0cf1ddc3
|
[Model] support minicpm3 (#8297)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-14 14:50:26 +00:00 |
|
Charlie Fu
|
1ef0d2efd0
|
[Kernel][Hardware][Amd]Custom paged attention kernel for rocm (#8310)
|
2024-09-13 17:01:11 -07:00 |
|
Kunshang Ji
|
851725202a
|
[Hardware][intel GPU] bump up ipex version to 2.3 (#8365)
Co-authored-by: Yan Ma <yan.ma@intel.com>
|
2024-09-13 16:54:34 -07:00 |
|
Simon Mo
|
9ba0817ff1
|
bump version to v0.6.1.post2 (#8473)
|
2024-09-13 11:35:00 -07:00 |
|
Nick Hill
|
18e9e1f7b3
|
[HotFix] Fix final output truncation with stop string + streaming (#8468)
|
2024-09-13 11:31:12 -07:00 |
|
Isotr0py
|
f57092c00b
|
[Doc] Add oneDNN installation to CPU backend documentation (#8467)
|
2024-09-13 18:06:30 +00:00 |
|
Cyrus Leung
|
a84e598e21
|
[CI/Build] Reorganize models tests (#7820)
|
2024-09-13 10:20:06 -07:00 |
|
youkaichao
|
0a4806f0a9
|
[plugin][torch.compile] allow to add custom compile backend (#8445)
|
2024-09-13 09:32:42 -07:00 |
|
Cyrus Leung
|
ecd7a1d5b6
|
[Installation] Gate FastAPI version for Python 3.8 (#8456)
|
2024-09-13 09:02:26 -07:00 |
|
youkaichao
|
a2469127db
|
[misc][ci] fix quant test (#8449)
|
2024-09-13 17:20:14 +08:00 |
|
Jee Jee Li
|
06311e2956
|
[Misc] Skip loading extra bias for Qwen2-VL GPTQ-Int8 (#8442)
|
2024-09-13 07:58:28 +00:00 |
|
youkaichao
|
cab69a15e4
|
[doc] recommend pip instead of conda (#8446)
|
2024-09-12 23:52:41 -07:00 |
|
Isotr0py
|
9b4a3b235e
|
[CI/Build] Enable InternVL2 PP test only on single node (#8437)
|
2024-09-13 06:35:20 +00:00 |
|
Simon Mo
|
acda0b35d0
|
bump version to v0.6.1.post1 (#8440)
|
2024-09-12 21:39:49 -07:00 |
|
William Lin
|
ba77527955
|
[bugfix] torch profiler bug for single gpu with GPUExecutor (#8354)
|
2024-09-12 21:30:00 -07:00 |
|
Alexander Matveev
|
6821020109
|
[Bugfix] Fix async log stats (#8417)
|
2024-09-12 20:48:59 -07:00 |
|
Cyrus Leung
|
8427550488
|
[CI/Build] Update pixtral tests to use JSON (#8436)
|
2024-09-13 03:47:52 +00:00 |
|
Cyrus Leung
|
3f79bc3d1a
|
[Bugfix] Bump fastapi and pydantic version (#8435)
|
2024-09-13 03:21:42 +00:00 |
|
shangmingc
|
40c396533d
|
[Bugfix] Mapping physical device indices for e2e test utils (#8290)
|
2024-09-13 11:06:28 +08:00 |
|
Cyrus Leung
|
5ec9c0fb3c
|
[Core] Factor out input preprocessing to a separate class (#7329)
|
2024-09-13 02:56:13 +00:00 |
|
Dipika Sikka
|
8f44a92d85
|
[BugFix] fix group_topk (#8430)
|
2024-09-13 09:23:42 +08:00 |
|
Roger Wang
|
360ddbd37e
|
[Misc] Update Pixtral example (#8431)
|
2024-09-12 17:31:18 -07:00 |
|
Wenxiang
|
a480939e8e
|
[Bugfix] Fix weight loading issue by rename variable. (#8293)
|
2024-09-12 19:25:00 -04:00 |
|
Patrick von Platen
|
d31174a4e1
|
[Hotfix][Pixtral] Fix multiple images bugs (#8415)
|
2024-09-12 15:21:51 -07:00 |
|
Roger Wang
|
b61bd98f90
|
[CI/Build] Disable multi-node test for InternVL2 (#8428)
|
2024-09-12 15:05:35 -07:00 |
|
Roger Wang
|
c16369455f
|
[Hotfix][Core][VLM] Disable chunked prefill by default and prefix caching for multimodal models (#8425)
|
2024-09-12 14:06:51 -07:00 |
|
Alexander Matveev
|
019877253b
|
[Bugfix] multi-step + flashinfer: ensure cuda graph compatible (#8427)
|
2024-09-12 21:01:50 +00:00 |
|
Nick Hill
|
551ce01078
|
[Core] Add engine option to return only deltas or final output (#7381)
|
2024-09-12 12:02:00 -07:00 |
|
William Lin
|
a6c0f3658d
|
[multi-step] add flashinfer backend (#7928)
|
2024-09-12 11:16:22 -07:00 |
|
Joe Runde
|
f2e263b801
|
[Bugfix] Offline mode fix (#8376)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-09-12 11:11:57 -07:00 |
|