1485 Commits

Author SHA1 Message Date
Tyler Michael Smith
02cc3b51a7
[misc] benchmark_serving.py -- add ITL results and tweak TPOT results (#5263) 2024-06-05 10:17:51 -07:00
Simon Mo
d5b1eb081e
[CI] Add nightly benchmarks (#5260) 2024-06-05 09:42:08 -07:00
tomeras91
f0a500545f
[Frontend] OpenAI API server: Add add_special_tokens to ChatCompletionRequest (default False) (#5278) 2024-06-05 09:32:58 -07:00
Woosuk Kwon
c65146e75e
[Misc] Fix docstring of get_attn_backend (#5271) 2024-06-05 09:18:59 -07:00
Woosuk Kwon
41ca62cf03
[Misc] Add CustomOp interface for device portability (#5255) 2024-06-05 09:18:19 -07:00
zifeitong
974fc9b845
[Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to True (#5226) 2024-06-04 19:37:28 -07:00
youkaichao
fee4dcc33a
[Misc] update collect env (#5261) 2024-06-04 17:29:09 -05:00
Michael Goin
650a4cc55e
[Misc] Add transformers version to collect_env.py (#5259) 2024-06-04 12:52:28 -07:00
Simon Mo
9ca62d8668
[CI] mark AMD test as softfail to prevent blockage (#5256) 2024-06-04 11:34:53 -07:00
Li, Jiang
45c35f0d58
[CI/Build] Reducing CPU CI execution time (#5241) 2024-06-04 10:26:40 -07:00
Cyrus Leung
9ba093b4f4
[CI/Build] Simplify model loading for HfRunner (#5251) 2024-06-04 10:09:19 -07:00
Woosuk Kwon
27208be66e
[Kernel] Add back batch size 1536 and 3072 to MoE tuning (#5242) 2024-06-04 09:58:47 -07:00
Jie Fu (傅杰)
87d5abef75
[Bugfix] Fix a bug caused by pip install setuptools>=49.4.0 for CPU backend (#5249) 2024-06-04 09:57:51 -07:00
Cyrus Leung
ec784b2526
[CI/Build] Add inputs tests (#5215) 2024-06-03 21:01:46 -07:00
zifeitong
a58f24e590
[Bugfix] Fix torch.compile() error when using MultiprocessingGPUExecutor (#5229) 2024-06-03 20:55:50 -07:00
afeldman-nm
f42a006b15
[Bugfix]: During testing, use pytest monkeypatch for safely overriding the env var that indicates the vLLM backend (#5210) 2024-06-03 20:32:57 -07:00
Woosuk Kwon
3a434b07ed
[Kernel] Enhance MoE benchmarking & tuning script (#4921) 2024-06-03 20:06:59 -07:00
Zhuohan Li
bd0e7802e0
[Bugfix] Add warmup for prefix caching example (#5235) 2024-06-03 19:36:41 -07:00
Toshiki Kataoka
06b2550cbb
[Bugfix] Support prompt_logprobs==0 (#5217) 2024-06-03 17:59:30 -07:00
Breno Faria
f775a07e30
[FRONTEND] OpenAI tools support named functions (#5032) 2024-06-03 18:25:29 -05:00
Kevin H. Luu
4f0d17c05c
New CI template on AWS stack (#5110)
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-03 16:16:43 -07:00
Kaiyang Chen
10c38e3e46
[Misc]: Implement CPU/GPU swapping in BlockManagerV2 (#3834) 2024-06-03 13:37:11 -07:00
Yuan
cafb8e06c5
[CI/BUILD] enable intel queue for longer CPU tests (#4113) 2024-06-03 10:39:50 -07:00
Tyler Michael Smith
cbb2f59cc8
[Kernel] Pass a device pointer into the quantize kernel for the scales (#5159) 2024-06-03 09:52:30 -07:00
Antoni Baum
0ab278ca31
[Core] Remove unnecessary copies in flash attn backend (#5138) 2024-06-03 09:39:31 -07:00
Cyrus Leung
7a64d24aad
[Core] Support image processor (#4197) 2024-06-02 22:56:41 -07:00
Cyrus Leung
dfbe60dc62
[Misc] Simplify code and fix type annotations in conftest.py (#5118) 2024-06-02 16:05:50 -07:00
Divakar Verma
a66cf40b20
[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927)
This PR enables the fused topk_softmax kernel used in moe layer for HIP
2024-06-02 14:13:26 -07:00
Avinash Raj
f790ad3c50
[Frontend][OpenAI] Support for returning max_model_len on /v1/models response (#4643) 2024-06-02 08:06:13 +00:00
Simon Mo
ed59a7ed23
Update test_ignore_eos (#4898) 2024-06-02 02:21:53 +00:00
Robert Shaw
044793d8df
[BugFix] Prevent LLM.encode for non-generation Models (#5184)
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-06-01 23:35:41 +00:00
Daniil Arapov
c2d6d2f960
[Bugfix]: Fix issues related to prefix caching example (#5177) (#5180) 2024-06-01 15:53:52 -07:00
Zhuohan Li
8279078e21
[Bugfix] Remove deprecated @abstractproperty (#5174) 2024-06-01 22:40:25 +00:00
chenqianfzh
b9c0605a8e
[Feature][Kernel] Support bitsandbytes quantization and QLoRA (#4776) 2024-06-01 14:51:10 -06:00
Nadav Shmayovits
37464a0f74
[Bugfix] Fix call to init_logger in openai server (#4765) 2024-06-01 17:18:50 +00:00
Ye Cao
c354072828
[Minor] Fix the path typo in loader.py: save_sharded_states.py -> save_sharded_state.py (#5151)
Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
2024-06-01 17:11:22 +00:00
Varun Sundar Rabindranath
f081c3ce4b
[Kernel] Update Cutlass fp8 configs (#5144)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-06-01 08:46:07 +00:00
Tyler Michael Smith
260d119e86
[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU (#5137) 2024-06-01 06:45:32 +00:00
Daniele
a360ff80bb
[CI/Build] CMakeLists: build all extensions' cmake targets at the same time (#5034) 2024-05-31 22:06:45 -06:00
Tyler Michael Smith
1197e02141
[Build] Guard against older CUDA versions when building CUTLASS 3.x kernels (#5168) 2024-05-31 17:21:38 -07:00
Nick Hill
657579113f
[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (#5171) 2024-05-31 17:20:19 -07:00
Cody Yu
e9899fb7a4
[Model] Enable FP8 QKV in MoE and refine kernel tuning script (#5039) 2024-05-31 14:29:19 -07:00
functionxu123
a377f0bd5e
[Misc]: optimize eager mode host time (#4196)
Co-authored-by: xuhao <xuhao@cambricon.com>
2024-05-31 13:14:50 +08:00
Simon Mo
e9d3aa04f6
Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5)" (#5149) 2024-05-30 22:00:26 -07:00
SnowDist
a22dea54d3
[Model] Support MAP-NEO model (#5081)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-05-30 19:24:41 -07:00
simon-mo
533c217792 Fix cutlass sm_90a vesrion in CMakeList 2024-05-31 02:13:01 +00:00
Alexander Matveev
6d21fa1cad
[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5) (#5136) 2024-05-30 21:02:11 -05:00
Robert Shaw
b35be5403f
[Bugfix] Avoid Warnings in SparseML Activation Quantization (#5120) 2024-05-30 17:04:37 -07:00
Simon Mo
45a1a69b98
[Build] Disable sm_90a in cu11 (#5141) 2024-05-30 14:37:16 -07:00
Simon Mo
87a658c812
Bump version to v0.4.3 (#5046) 2024-05-30 11:13:46 -07:00