20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Vincent	a4f1ee35d6	Deprecate `best_of` Sampling Parameter in anticipation for vLLM V1 (#13997 ) Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com> Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-05 20:22:43 +00:00
Jee Jee Li	7bab4bb048	[Misc] Add Qwen2MoeForCausalLM moe tuning support (#14276 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-05 23:11:29 +08:00
Michael Goin	f78c0be80a	Fix benchmark_moe.py tuning for CUDA devices (#14164 )	2025-03-03 21:11:03 -08:00
Divakar Verma	bb5b640359	[core] moe fp8 block quant tuning support (#14068 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-03-04 01:30:23 +00:00
TJian	848a6438ae	[ROCm] Faster Custom Paged Attention kernels (#12348 )	2025-03-03 09:24:45 -08:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
YajieWang	6a92ff93e1	[Misc][Kernel]: Add GPTQAllSpark Quantization (#12931 )	2025-02-28 22:30:59 -08:00
Jee Jee Li	6a84164add	[Bugfix] Add file lock for ModelScope download (#14060 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-01 06:10:28 +00:00
Brayden Zhong	ec8a5e5386	[Misc]: Add support for goodput on guided benchmarking + TPOT calculation refactor (#13736 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-02-26 19:06:47 +08:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
Jongseok Park	781096e385	Expert Parallelism (EP) Support for DeepSeek V2 (#12583 )	2025-02-24 07:33:20 -08:00
Huy Do	e7ef74e26e	Fix some issues with benchmark data output (#13641 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-02-24 10:23:18 +08:00
Roger Wang	9bebc9512f	[Misc] Deprecate `--dataset` from `benchmark_serving.py` (#13708 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-23 13:32:20 +00:00
Cyrus Leung	7f6bae561c	[CI/Build] Fix pre-commit errors (#13696 )	2025-02-22 00:31:26 -08:00
Robin	8aca27fa11	[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len (#13691 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-02-22 14:10:38 +08:00
Huy Do	45186834a0	Run v1 benchmark and integrate with PyTorch OSS benchmark database (#13068 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-02-17 08:16:32 +00:00
Keyun Tong	3ee696a63d	[RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM (#12518 ) Signed-off-by: Keyun Tong <tongkeyun@gmail.com>	2025-02-12 12:25:58 +08:00
Woosuk Kwon	58047c6f04	[Benchmark] Add BurstGPT to benchmark_serving (#13063 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-02-10 21:25:30 -08:00
Cyrus Leung	8a69e0e20e	[CI/Build] Auto-fix Markdown files (#12941 )	2025-02-08 04:25:15 -08:00
Varun Sundar Rabindranath	7e1837676a	[misc] Add LoRA to benchmark_serving (#12898 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-08 17:15:44 +08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Tyler Michael Smith	cfa134d247	[Bugfix/CI] Fixup benchmark_moe.py (#12562 ) Fixes `is_marlin` not being passed into `get_default_config` Also allow `--tensor-parallel-size` in addition to `-tp` and `--tp-size` Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-01 13:41:35 +08:00
Lucas Wilkinson	9798b2fb00	[Kernel] Update `cutlass_scaled_mm` to support 2d group (blockwise) scaling (#11868 )	2025-01-30 18:33:00 -08:00
Divakar Verma	1c1bb0bbf2	[Misc][MoE] add Deepseek-V3 moe tuning support (#12558 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-01-30 00:47:30 +00:00
Harry Mellor	823ab79633	Update `pre-commit` hooks (#12475 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-27 17:23:08 -07:00
Junichi Sato	3bb8e2c9a2	[Misc] Enable proxy support in benchmark script (#12356 ) Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp>	2025-01-24 14:58:26 +00:00
Roger Wang	3c818bdb42	[Misc] Use VisionArena Dataset for VLM Benchmarking (#12389 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-01-24 00:22:04 -08:00
Junichi Sato	9726ad676d	[Misc] Fix OpenAI API Compatibility Issues in Benchmark Script (#12357 ) Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp>	2025-01-23 17:02:13 -05:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
Nick Hill	222a9dc350	[Benchmark] More accurate TPOT calc in `benchmark_serving.py` (#12288 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-01-22 13:46:14 +08:00
Divakar Verma	2acba47d9b	[bugfix] moe tuning. rm is_navi() (#12273 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-01-21 22:47:32 +00:00
gujing	936db119ed	benchmark_serving support --served-model-name param (#12109 ) Signed-off-by: zibai <zibai.gj@alibaba-inc.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-01-19 09:59:56 +00:00
Divakar Verma	8027a72461	[ROCm][MoE] moe tuning support for rocm (#12049 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-01-17 14:49:16 +08:00
Varun Sundar Rabindranath	5fd24ec02e	[misc] Add LoRA kernel micro benchmarks (#11579 )	2025-01-16 15:51:40 +00:00
elijah	c6db21313c	bugfix: Fix signature mismatch in benchmark's `get_tokenizer` function (#11982 ) Signed-off-by: elijah <f1renze.142857@gmail.com>	2025-01-13 15:22:07 +00:00
minmin	8a579408f3	[Misc] Update benchmark_prefix_caching.py fixed example usage (#11920 ) Signed-off-by: Ren MinMin <renmm6@chinaunicom.cn> Co-authored-by: Ren MinMin <renmm6@chinaunicom.cn>	2025-01-10 20:39:22 +00:00
Kuntai Du	5959564f94	Doc fix in `benchmark_long_document_qa_throughput.py` (#11933 ) Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2025-01-10 23:51:43 +08:00
Ye (Charlotte) Qi	1d967acb45	[Bugfix] fix beam search input errors and latency benchmark script (#11875 ) Signed-off-by: Ye Qi <yeq@meta.com> Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com>	2025-01-09 17:36:39 +08:00
Divakar Verma	4d29e91be8	[Misc] sort torch profiler table by kernel timing (#11813 )	2025-01-08 10:57:04 +08:00
Yihua Cheng	0c6f998554	[Benchmark] Add benchmark script for CPU offloading (#11533 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: KuntaiDu <kuntai@uchicago.edu>	2025-01-01 00:10:55 +00:00
Jiaxin Shan	fc601665eb	[Misc] Update disaggregation benchmark scripts and test logs (#11456 ) Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>	2024-12-25 06:58:48 +00:00
Varun Sundar Rabindranath	98356735ac	[misc] benchmark_throughput : Add LoRA (#11267 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-12-19 15:43:16 +08:00
Dipika Sikka	60508ffda9	[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995 ) Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com> Co-authored-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2024-12-18 09:57:16 -05:00
Roger Wang	02222a0256	[Misc] Kernel Benchmark for `RMSNorm` (#11241 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Xiaoyu Zhang <BBuf@users.noreply.github.com>	2024-12-17 06:57:02 +00:00
Alexander Matveev	238c0d93b4	[Misc] Add tokenizer_mode param to benchmark_serving.py (#11174 ) Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>	2024-12-13 16:19:10 +00:00
Luka Govedič	30870b4f66	[torch.compile] Dynamic fp8 + rms_norm fusion (#10906 ) Signed-off-by: luka <luka@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-12-13 03:19:23 +00:00
Chendi.Xue	82eb5ea8f3	Benchmark serving structured output (#10880 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-12-04 16:28:21 -05:00
Chendi.Xue	381ac93bb5	[Benchmark] Benchmark structured output with datasets (#10557 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2024-12-03 17:21:06 -07:00
Michael Goin	4433195ab7	[Bugfix] Prevent benchmark_throughput.py from using duplicated random prompts (#10753 )	2024-12-03 02:26:15 +00:00
Kuntai Du	0590ec3fd9	[Core] Implement disagg prefill by StatelessProcessGroup (#10502 ) This PR provides initial support for single-node disaggregated prefill in 1P1D scenario. Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>	2024-12-01 19:01:00 -06:00

1 2 3 4 5

245 Commits