2679 Commits

Author SHA1 Message Date
Rui Qiao
cbdb252259
[Misc] Limit to ray[adag] 2.35 to avoid backward incompatible change (#8509)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-09-17 00:06:26 -07:00
youkaichao
99aa4eddaf
[torch.compile] register allreduce operations as custom ops (#8526) 2024-09-16 22:57:57 -07:00
Roger Wang
ee2bceaaa6
[Misc][Bugfix] Disable guided decoding for mistral tokenizer (#8521) 2024-09-16 22:22:45 -07:00
Alex Brooks
1c1bb388e0
[Frontend] Improve Nullable kv Arg Parsing (#8525)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-09-17 04:17:32 +00:00
Simon Mo
546034b466
[refactor] remove triton based sampler (#8524) 2024-09-16 20:04:48 -07:00
Joe Runde
cca61642e0
[Bugfix] Fix 3.12 builds on main (#8510)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-09-17 00:01:45 +00:00
Simon Mo
5ce45eb54d
[misc] small qol fixes for release process (#8517) 2024-09-16 15:11:27 -07:00
Simon Mo
5478c4b41f
[perf bench] set timeout to debug hanging (#8516) 2024-09-16 14:30:02 -07:00
Kevin Lin
47f5e03b5b
[Bugfix] Bind api server port before starting engine (#8491) 2024-09-16 13:56:28 -07:00
youkaichao
2759a43a26
[doc] update doc on testing and debugging (#8514) 2024-09-16 12:10:23 -07:00
Luka Govedič
5d73ae49d6
[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (#7270) 2024-09-16 11:52:40 -07:00
sasha0552
781e3b9a42
[Bugfix][Kernel] Fix build for sm_60 in GGUF kernel (#8506) 2024-09-16 12:15:57 -06:00
Nick Hill
acd5511b6d
[BugFix] Fix clean shutdown issues (#8492) 2024-09-16 09:33:46 -07:00
lewtun
837c1968f9
[Frontend] Expose revision arg in OpenAI server (#8501) 2024-09-16 15:55:26 +00:00
ElizaWszola
a091e2da3e
[Kernel] Enable 8-bit weights in Fused Marlin MoE (#8032)
Co-authored-by: Dipika <dipikasikka1@gmail.com>
2024-09-16 09:47:19 -06:00
Isotr0py
fc990f9795
[Bugfix][Kernel] Add IQ1_M quantization implementation to GGUF kernel (#8357) 2024-09-15 16:51:44 -06:00
Chris
3724d5f6b5
[Bugfix][Model] Fix Python 3.8 compatibility in Pixtral model by updating type annotations (#8490) 2024-09-15 04:20:05 +00:00
Woosuk Kwon
50e9ec41fc
[TPU] Implement multi-step scheduling (#8489) 2024-09-14 16:58:31 -07:00
youkaichao
47790f3e32
[torch.compile] add a flag to disable custom op (#8488) 2024-09-14 13:07:16 -07:00
youkaichao
a36e070dad
[torch.compile] fix functionalization (#8480) 2024-09-14 09:46:04 -07:00
ywfang
8a0cf1ddc3
[Model] support minicpm3 (#8297)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-14 14:50:26 +00:00
Charlie Fu
1ef0d2efd0
[Kernel][Hardware][Amd]Custom paged attention kernel for rocm (#8310) 2024-09-13 17:01:11 -07:00
Kunshang Ji
851725202a
[Hardware][intel GPU] bump up ipex version to 2.3 (#8365)
Co-authored-by: Yan Ma <yan.ma@intel.com>
2024-09-13 16:54:34 -07:00
Simon Mo
9ba0817ff1
bump version to v0.6.1.post2 (#8473) 2024-09-13 11:35:00 -07:00
Nick Hill
18e9e1f7b3
[HotFix] Fix final output truncation with stop string + streaming (#8468) 2024-09-13 11:31:12 -07:00
Isotr0py
f57092c00b
[Doc] Add oneDNN installation to CPU backend documentation (#8467) 2024-09-13 18:06:30 +00:00
Cyrus Leung
a84e598e21
[CI/Build] Reorganize models tests (#7820) 2024-09-13 10:20:06 -07:00
youkaichao
0a4806f0a9
[plugin][torch.compile] allow to add custom compile backend (#8445) 2024-09-13 09:32:42 -07:00
Cyrus Leung
ecd7a1d5b6
[Installation] Gate FastAPI version for Python 3.8 (#8456) 2024-09-13 09:02:26 -07:00
youkaichao
a2469127db
[misc][ci] fix quant test (#8449) 2024-09-13 17:20:14 +08:00
Jee Jee Li
06311e2956
[Misc] Skip loading extra bias for Qwen2-VL GPTQ-Int8 (#8442) 2024-09-13 07:58:28 +00:00
youkaichao
cab69a15e4
[doc] recommend pip instead of conda (#8446) 2024-09-12 23:52:41 -07:00
Isotr0py
9b4a3b235e
[CI/Build] Enable InternVL2 PP test only on single node (#8437) 2024-09-13 06:35:20 +00:00
Simon Mo
acda0b35d0
bump version to v0.6.1.post1 (#8440) 2024-09-12 21:39:49 -07:00
William Lin
ba77527955
[bugfix] torch profiler bug for single gpu with GPUExecutor (#8354) 2024-09-12 21:30:00 -07:00
Alexander Matveev
6821020109
[Bugfix] Fix async log stats (#8417) 2024-09-12 20:48:59 -07:00
Cyrus Leung
8427550488
[CI/Build] Update pixtral tests to use JSON (#8436) 2024-09-13 03:47:52 +00:00
Cyrus Leung
3f79bc3d1a
[Bugfix] Bump fastapi and pydantic version (#8435) 2024-09-13 03:21:42 +00:00
shangmingc
40c396533d
[Bugfix] Mapping physical device indices for e2e test utils (#8290) 2024-09-13 11:06:28 +08:00
Cyrus Leung
5ec9c0fb3c
[Core] Factor out input preprocessing to a separate class (#7329) 2024-09-13 02:56:13 +00:00
Dipika Sikka
8f44a92d85
[BugFix] fix group_topk (#8430) 2024-09-13 09:23:42 +08:00
Roger Wang
360ddbd37e
[Misc] Update Pixtral example (#8431) 2024-09-12 17:31:18 -07:00
Wenxiang
a480939e8e
[Bugfix] Fix weight loading issue by rename variable. (#8293) 2024-09-12 19:25:00 -04:00
Patrick von Platen
d31174a4e1
[Hotfix][Pixtral] Fix multiple images bugs (#8415) 2024-09-12 15:21:51 -07:00
Roger Wang
b61bd98f90
[CI/Build] Disable multi-node test for InternVL2 (#8428) 2024-09-12 15:05:35 -07:00
Roger Wang
c16369455f
[Hotfix][Core][VLM] Disable chunked prefill by default and prefix caching for multimodal models (#8425) 2024-09-12 14:06:51 -07:00
Alexander Matveev
019877253b
[Bugfix] multi-step + flashinfer: ensure cuda graph compatible (#8427) 2024-09-12 21:01:50 +00:00
Nick Hill
551ce01078
[Core] Add engine option to return only deltas or final output (#7381) 2024-09-12 12:02:00 -07:00
William Lin
a6c0f3658d
[multi-step] add flashinfer backend (#7928) 2024-09-12 11:16:22 -07:00
Joe Runde
f2e263b801
[Bugfix] Offline mode fix (#8376)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-09-12 11:11:57 -07:00