Russell Bryant
|
731aec5be7
|
[CI/Build] Limit github CI jobs based on files changed (#9928)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-05 10:30:42 -08:00 |
|
Chenghao (Alan) Yang
|
09d3550372
|
[Misc] Add logging for CUDA memory (#10027)
Signed-off-by: Chenghao Yang <yangalan1996@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Chenghao Yang <yangalan1996@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-11-05 09:50:50 -08:00 |
|
Richard Liu
|
cd34029e91
|
Refactor TPU requirements file and pin build dependencies (#10010)
Signed-off-by: Richard Liu <ricliu@google.com>
|
2024-11-05 16:48:44 +00:00 |
|
Russell Bryant
|
5952d81139
|
[Frontend] Fix tcp port reservation for api server (#10012)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-05 07:50:57 -08:00 |
|
Chauncey
|
93dee88f6b
|
[Misc] vllm CLI flags should be ordered for better user readability (#10017)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2024-11-05 18:59:56 +08:00 |
|
Gene Der Su
|
7a83b1aec0
|
[BugFix] Lazy import ray (#10021)
|
2024-11-05 10:04:10 +00:00 |
|
Tyler Michael Smith
|
ad23318928
|
[Bugfix] Fixup Mamba (#10004)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-05 03:46:38 +00:00 |
|
Cyrus Leung
|
bbc3619dc8
|
[Core] Make encoder-decoder inputs a nested structure to be more composable (#9604)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-05 10:07:31 +08:00 |
|
Tyler Michael Smith
|
04bbf38e05
|
[Core] Use os.sched_yield in ShmRingBuffer instead of time.sleep (#9994)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-05 01:08:21 +00:00 |
|
Michael Goin
|
8f0a9ca890
|
[Bugfix] Respect modules_to_not_convert within awq_marlin (#9895)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-04 16:57:44 -07:00 |
|
youkaichao
|
2094062b4e
|
[4.5/N] bugfix for quant config in speculative decode (#10007)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-04 15:11:59 -08:00 |
|
bnellnm
|
d93478b399
|
[Bugfix] Upgrade to pytorch 2.5.1 (#10001)
Signed-off-by: Bill Nell <bill@neuralmagic.com>
|
2024-11-04 15:11:28 -08:00 |
|
tomeras91
|
ac04a97a9f
|
[Frontend] Add max_tokens prometheus metric (#9881)
Signed-off-by: Tomer Asida <tomera@ai21.com>
|
2024-11-04 22:53:24 +00:00 |
|
lkchen
|
9a5664d4a4
|
[Misc] Refactor benchmark_throughput.py (#9779)
Signed-off-by: Linkun Chen <github+anyscale@lkchen.net>
Co-authored-by: Linkun Chen <lkchen@github.com>
Co-authored-by: Linkun Chen <github+anyscale@lkchen.net>
|
2024-11-04 14:32:16 -08:00 |
|
Robert Shaw
|
04cef2c6ab
|
[Bugfix] Fix MQLLMEngine hanging (#9973)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2024-11-04 16:01:43 -05:00 |
|
Roger Wang
|
6e056bcf04
|
[Doc] Update VLM doc about loading from local files (#9999)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-11-04 19:47:11 +00:00 |
|
hissu-hyvarinen
|
5208dc7a20
|
[Bugfix][CI/Build][Hardware][AMD] Shard ID parameters in AMD tests running parallel jobs (#9279)
Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>
|
2024-11-04 11:37:46 -08:00 |
|
Robert Shaw
|
1c45f4c385
|
[CI] Basic Integration Test For TPU (#9968)
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-11-04 11:34:26 -08:00 |
|
Mor Zusman
|
603a661ae8
|
[Model] factoring out MambaMixer out of Jamba (#8993)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
|
2024-11-04 18:00:00 +00:00 |
|
Jee Jee Li
|
fb2716d641
|
[Misc]Reduce BNB static variable (#9987)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-04 17:04:40 +00:00 |
|
youkaichao
|
8d72bb20fa
|
[4/N] make quant config first-class citizen (#9978)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-04 08:51:31 -08:00 |
|
Chauncey
|
ac6b8f19b9
|
[Frontend] Multi-Modality Support for Loading Local Image Files (#9915)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2024-11-04 15:34:57 +00:00 |
|
Mengqing Cao
|
ccb5376a9a
|
[Bugfix][OpenVINO] Fix circular reference #9939 (#9974)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2024-11-04 18:14:13 +08:00 |
|
Tran Quang Dai
|
ea4adeddc1
|
[Bugfix] Fix E2EL mean and median stats (#9984)
Signed-off-by: daitran2k1 <tranquangdai7a@gmail.com>
|
2024-11-04 09:37:58 +00:00 |
|
Yang Zheng
|
4dbcbbeb09
|
[Misc] Compute query_start_loc/seq_start_loc on CPU (#9447)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
|
2024-11-04 08:54:37 +00:00 |
|
Gregory Shtrasberg
|
b67feb1274
|
[Bugfix]Using the correct type hints (#9885)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2024-11-04 06:19:51 +00:00 |
|
Jee Jee Li
|
c49f0407ba
|
[Bugfix] Fix MiniCPMV and Mllama BNB bug (#9917)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-04 03:36:41 +00:00 |
|
Robert Shaw
|
91c9ebbb1b
|
[V1] Fix Configs (#9971)
|
2024-11-04 00:24:40 +00:00 |
|
shanshan wang
|
54597724f4
|
[Model] Add support for H2OVL-Mississippi models (#9747)
Signed-off-by: Shanshan Wang <shanshan.wang@h2o.ai>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-11-04 00:15:36 +00:00 |
|
Nick Hill
|
1f1b6d6eda
|
[V1] Support per-request seed (#9945)
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
|
2024-11-03 09:14:17 -08:00 |
|
youkaichao
|
3bb4befea7
|
[bugfix] fix tsts (#9959)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-02 15:54:05 -07:00 |
|
Yongzao
|
ae5279a163
|
[torch.compile] Adding torch compile to vision-language models (#9946)
|
2024-11-02 12:56:05 -07:00 |
|
Nikita Furin
|
1b73ab2a1f
|
[CI/Build] Quoting around > (#9956)
|
2024-11-02 12:50:28 -07:00 |
|
youkaichao
|
cea808f325
|
[3/N] model runner pass the whole config to model (#9958)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-02 12:08:49 -07:00 |
|
youkaichao
|
74b529ceee
|
[bugfix] fix chatglm dummy_data_for_glmv (#9955)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-02 08:03:33 -07:00 |
|
Robert Shaw
|
d6459b4516
|
[V1] Fix EngineArgs refactor on V1 (#9954)
|
2024-11-02 07:44:38 -07:00 |
|
youkaichao
|
e893795443
|
[2/N] executor pass the complete config to worker/modelrunner (#9938)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2024-11-02 07:35:05 -07:00 |
|
Michael Green
|
1d4cfe2be1
|
[Doc] Updated tpu-installation.rst with more details (#9926)
Signed-off-by: Michael Green <mikegre@google.com>
|
2024-11-02 10:06:45 -04:00 |
|
Nick Hill
|
eed92f12fc
|
[Docs] Update Granite 3.0 models in supported models table (#9930)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-11-02 09:02:18 +00:00 |
|
youkaichao
|
af7380d83b
|
[torch.compile] fix cpu broken code (#9947)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-01 23:35:47 -07:00 |
|
sroy745
|
a78dd3303e
|
[Encoder Decoder] Add flash_attn kernel support for encoder-decoder models (#9559)
|
2024-11-01 23:22:49 -07:00 |
|
Kevin H. Luu
|
d522034c85
|
[ci/build] Have dependabot ignore pinned dependencies (#9935)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-11-01 23:56:13 +00:00 |
|
Peter Salas
|
6c0b7f548d
|
[Core][VLM] Add precise multi-modal placeholder tracking (#8346)
Signed-off-by: Peter Salas <peter@fixie.ai>
|
2024-11-01 16:21:10 -07:00 |
|
dependabot[bot]
|
d151fde834
|
[ci/build] Bump the patch-update group with 10 updates (#9897)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
|
2024-11-01 23:04:42 +00:00 |
|
Gene Der Su
|
27cd36e6e2
|
[Bugfix] PicklingError on RayTaskError (#9934)
Signed-off-by: Gene Su <e870252314@gmail.com>
|
2024-11-01 22:08:23 +00:00 |
|
youkaichao
|
18bd7587b7
|
[1/N] pass the complete config from engine to executor (#9933)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-01 13:51:57 -07:00 |
|
Pavani Majety
|
598b6d7b07
|
[Bugfix/Core] Flashinfer k_scale and v_scale (#9861)
|
2024-11-01 12:15:05 -07:00 |
|
youkaichao
|
aff1fd8188
|
[torch.compile] use interpreter with stable api from pytorch (#9889)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-01 11:50:37 -07:00 |
|
André Jonasson
|
4581d2cc02
|
[Core] Refactor: Clean up unused argument in Scheduler._preempt (#9696)
Signed-off-by: André Jonasson <andre.jonasson@gmail.com>
|
2024-11-01 11:41:38 -07:00 |
|
Travis Johnson
|
1dd4cb2935
|
[Bugfix] Fix edge cases for MistralTokenizer (#9625)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2024-11-01 10:33:15 -07:00 |
|