20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Lucas Wilkinson	a8d604ca2a	[Misc] Disambiguate quantized types via a new ScalarType (#6396 )	2024-08-02 13:51:58 -07:00
Michael Goin	b482b9a5b1	[CI/Build] Add support for Python 3.12 (#7035 )	2024-08-02 13:51:22 -07:00
youkaichao	806949514a	[ci] set timeout for test_oot_registration.py (#7082 )	2024-08-02 10:03:24 -07:00
Jie Fu (傅杰)	c16eaac500	[Hardware][Intel CPU] Update torch 2.4.0 for CPU backend (#6931 )	2024-08-02 08:55:58 -07:00
Peng Guanwen	db35186391	[Core] Comment out unused code in sampler (#7023 )	2024-08-02 00:58:26 -07:00
youkaichao	660dea1235	[cuda][misc] remove error_on_invalid_device_count_status (#7069 )	2024-08-02 00:14:21 -07:00
Bongwon Jang	cf2a1a4d9d	Fix tracing.py (#7065 )	2024-08-01 23:28:00 -07:00
youkaichao	252357793d	[ci][distributed] try to fix pp test (#7054 )	2024-08-01 22:03:12 -07:00
Cyrus Leung	3bb4b1e4cd	[mypy] Speed up mypy checking (#7056 )	2024-08-01 19:49:43 -07:00
Lily Liu	954f7305a1	[Kernel] Fix input for flashinfer prefill wrapper. (#7008 )	2024-08-01 18:44:16 -07:00
Woosuk Kwon	6ce01f3066	[Performance] Optimize `get_seqs` (#7051 )	2024-08-01 18:29:52 -07:00
Tyler Michael Smith	6a11fdfbb8	[CI/Build][Bugfix] Fix CUTLASS header-only line (#7034 )	2024-08-01 13:51:15 -07:00
Woosuk Kwon	805a8a75f2	[Misc] Support attention logits soft-capping with flash-attn (#7022 )	2024-08-01 13:14:37 -07:00
omkar kakarparthi	562e580abc	Update run-amd-test.sh (#7044 )	2024-08-01 13:12:37 -07:00
Murali Andoorveedu	fc912e0886	[Models] Support Qwen model with PP (#6974 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-08-01 12:40:43 -07:00
Michael Goin	f4fd390f5d	[Bugfix] Lower gemma's unloaded_params exception to warning (#7002 )	2024-08-01 12:01:07 -07:00
Michael Goin	fb3db61688	[CI/Build] Remove sparseml requirement from testing (#7037 )	2024-08-01 12:00:51 -07:00
Isotr0py	2dd34371a6	[Bugfix] Fix RMSNorm forward in InternViT attention qk_layernorm (#6992 )	2024-08-01 12:00:28 -07:00
Sage Moore	7e0861bd0b	[CI/Build] Update PyTorch to 2.4.0 (#6951 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-01 11:11:24 -07:00
Alexei-V-Ivanov-AMD	a72a424b3e	[Build/CI] Fixing Docker Hub quota issue. (#7043 )	2024-08-01 11:07:37 -07:00
youkaichao	c8a7e93273	[core][scheduler] simplify and improve scheduler (#6867 )	2024-07-31 23:51:09 -07:00
zifeitong	3c10591ef2	[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user (#6954 )	2024-07-31 21:13:34 -07:00
Aurick Qiao	0437492ea9	PP comm optimization: replace send with partial send + allgather (#6695 ) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>	2024-07-31 20:15:42 -07:00
Travis Johnson	630dd9e0ae	[Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings (#6758 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-07-31 19:49:11 -07:00
Woosuk Kwon	23993a7997	[Bugfix][TPU] Do not use torch.Generator for TPUs (#6981 )	2024-07-31 18:50:28 -07:00
xuyi	1d2e7fb73f	[Model] Pipeline parallel support for Qwen2 (#6924 )	2024-07-31 18:49:51 -07:00
Jee Jee Li	7ecee34321	[Kernel][RFC] Refactor the punica kernel based on Triton (#5036 )	2024-07-31 17:12:24 -07:00
Simon Mo	7eb0cb4a14	Revert "[Frontend] Factor out code for running uvicorn" (#7012 ) Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>	2024-07-31 16:34:26 -07:00
Michael Goin	a0dce9383a	[Misc] Add compressed-tensors to optimized quant list (#7006 )	2024-07-31 14:40:44 -07:00
Varun Sundar Rabindranath	35e9c12bfa	[Kernel] Tuned int8 Cutlass Kernels for SM75 (T4) (#6996 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-07-31 14:40:32 -07:00
Varun Sundar Rabindranath	93548eb37e	[Kernel] Enable FP8 Cutlass for Ada Lovelace (#6950 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-07-31 14:40:22 -07:00
Michael Goin	460c1884e3	[Bugfix] Support cpu offloading with fp8 quantization (#6960 )	2024-07-31 12:47:46 -07:00
Cody Yu	bd70013407	[MISC] Introduce pipeline parallelism partition strategies (#6920 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-07-31 12:02:17 -07:00
Avshalom Manevich	2ee8d3ba55	[Model] use FusedMoE layer in Jamba (#6935 )	2024-07-31 12:00:24 -07:00
Cyrus Leung	daed30c4a9	[Bugfix] Fix feature size calculation for LLaVA-NeXT (#6982 )	2024-07-31 23:46:17 +08:00
Alphi	2f4e108f75	[Bugfix] Clean up MiniCPM-V (#6939 ) Co-authored-by: hezhihui <hzh7269@modelbest.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-07-31 14:39:19 +00:00
HandH1998	6512937de1	Support W4A8 quantization for vllm (#5218 )	2024-07-31 07:55:21 -06:00
Fei	c0644cf9ce	[Bugfix] fix logit processor excceed vocab size issue (#6927 )	2024-07-31 16:16:01 +08:00
Woosuk Kwon	533d1932d2	[Bugfix][TPU] Set readonly=True for non-root devices (#6980 )	2024-07-31 00:19:28 -07:00
Cyrus Leung	9f0e69b653	[CI/Build] Fix mypy errors (#6968 )	2024-07-30 19:49:48 -07:00
Cyrus Leung	f230cc2ca6	[Bugfix] Fix broadcasting logic for `multi_modal_kwargs` (#6836 )	2024-07-31 10:38:45 +08:00
Cyrus Leung	da1f7cc12a	[mypy] Enable following imports for some directories (#6681 )	2024-07-31 10:38:03 +08:00
Cade Daniel	c32ab8be1a	[Speculative decoding] Add serving benchmark for llama3 70b + speculative decoding (#6964 )	2024-07-31 00:53:21 +00:00
Cade Daniel	fb4f530bf5	[CI] [nightly benchmark] Do not re-download sharegpt dataset if exists (#6706 )	2024-07-30 16:28:49 -07:00
Cade Daniel	79319cedfa	[Nightly benchmarking suite] Remove pkill python from run benchmark suite (#6965 )	2024-07-30 16:28:05 -07:00
Simon Mo	40c27a7cbb	[Build] Temporarily Disable Kernels and LoRA tests (#6961 )	2024-07-30 14:59:48 -07:00
youkaichao	6ca8031e71	[core][misc] improve free_finished_seq_groups (#6865 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-07-30 14:32:12 -07:00
Tyler Michael Smith	d7a299edaa	[Kernel] Remove scaled_fp8_quant kernel padding footgun (#6842 )	2024-07-30 16:37:01 -04:00
Sanger Steel	052b6f8ca4	[Bugfix] Fix tensorizer memory profiling bug during testing (#6881 )	2024-07-30 11:48:50 -07:00
Ilya Lavrenov	5895b24677	[OpenVINO] Updated OpenVINO requirements and build docs (#6948 )	2024-07-30 11:33:01 -07:00

1 2 3 4 5 ...

2172 Commits