20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Lucas Wilkinson	86e9c8df29	[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 ) Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-09-23 13:46:26 -04:00
Daniele	ee5f34b1c2	[CI/Build] use setuptools-scm to set __version__ (#4738 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-23 09:44:26 -07:00
Jani Monoses	f2bd246c17	[VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size (#8707 )	2024-09-23 14:43:09 +00:00
Yanyi Liu	a79e522984	[Model] Support pp for qwen2-vl (#8696 )	2024-09-23 13:46:59 +00:00
Li, Jiang	3e83c12b5c	[Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model_runner (#8733 )	2024-09-23 13:15:16 +00:00
Isotr0py	e551ca1555	[Hardware][CPU] Refactor CPU model runner (#8729 )	2024-09-23 20:12:20 +08:00
Alex Brooks	9b8c8ba119	[Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-09-23 07:44:48 +00:00
Yan Ma	d23679eb99	[Bugfix] fix docker build for xpu (#8652 )	2024-09-22 22:54:18 -07:00
Luka Govedič	57a0702e63	[Bugfix] Fix CPU CMake build (#8723 ) Co-authored-by: Yuan <yuan.zhou@intel.com>	2024-09-22 20:40:46 -07:00
Tyler Michael Smith	3dda7c2250	[Bugfix] Avoid some bogus messages RE CUTLASS's revision when building (#8702 )	2024-09-22 22:24:59 -04:00
youkaichao	92ba7e7477	[misc] upgrade mistral-common (#8715 )	2024-09-22 15:41:59 -07:00
youkaichao	d4a2ac8302	[build] enable existing pytorch (for GH200, aarch64, nightly) (#8713 )	2024-09-22 12:47:54 -07:00
Lily Liu	c6bd70d772	[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701 )	2024-09-22 12:34:14 -07:00
litianjian	5b59532760	[Model][VLM] Add LLaVA-Onevision model support (#8486 ) Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-22 10:51:44 -07:00
Huazhong Ji	ca2b628b3c	[MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler (#8703 )	2024-09-22 10:44:09 -07:00
Alex Brooks	8ca5051b9a	[Misc] Use NamedTuple in Multi-image example (#8705 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-09-22 20:56:20 +08:00
Cyrus Leung	06ed2815e2	[Model] Refactor BLIP/BLIP-2 to support composite model loading (#8407 )	2024-09-22 12:24:21 +00:00
youkaichao	0e40ac9b7b	[ci][build] fix vllm-flash-attn (#8699 )	2024-09-21 23:24:58 -07:00
Isotr0py	13d88d4137	[Bugfix] Refactor composite weight loading logic (#8656 )	2024-09-22 04:33:27 +00:00
Tyler Michael Smith	d66ac62854	[Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (#8643 )	2024-09-21 23:45:02 +00:00
Divakar Verma	9dc7c6c7f3	[dbrx] refactor dbrx experts to extend FusedMoe class (#8518 )	2024-09-21 15:09:39 -06:00
rasmith	ec4aaad812	[Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x speedup MI300, minor improvement for MI250 (#8646 )	2024-09-21 09:20:54 +00:00
Andy Dai	4dfdf43196	[Doc] Fix typo in AMD installation guide (#8689 )	2024-09-21 00:24:12 -07:00
Cyrus Leung	5e85f4f82a	[VLM] Use `SequenceData.from_token_counts` to create dummy data (#8687 )	2024-09-20 23:28:56 -07:00
Luka Govedič	71c60491f2	[Kernel] Build flash-attn from source (#8245 )	2024-09-20 23:27:10 -07:00
youkaichao	0faab90eb0	[beam search] add output for manually checking the correctness (#8684 )	2024-09-20 19:55:33 -07:00
Cyrus Leung	0455c46ed4	[Core] Factor out common code in `SequenceData` and `Sequence` (#8675 )	2024-09-21 02:30:39 +00:00
Kunshang Ji	d4bf085ad0	[MISC] add support custom_op check (#8557 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-20 19:03:55 -07:00
Cyrus Leung	0057894ef7	[Core] Rename `PromptInputs` and `inputs`(#8673 )	2024-09-20 19:00:54 -07:00
zyddnys	0f961b3ce9	[Bugfix] Fix incorrect llava next feature size calculation (#8496 )	2024-09-20 22:48:32 +00:00
omrishiv	7f9c8902e3	[Hardware][AWS] update neuron to 2.20 (#8676 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-09-20 15:19:44 -07:00
omrishiv	7c8566aa4f	[Doc] neuron documentation update (#8671 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-09-20 15:04:37 -07:00
Patrick von Platen	b4e4eda92e	[Bugfix][Core] Fix tekken edge case for mistral tokenizer (#8640 )	2024-09-20 14:33:03 -07:00
Pastel！	2874bac618	[Bugfix] Config got an unexpected keyword argument 'engine' (#8556 )	2024-09-20 14:00:45 -07:00
Cyrus Leung	035fa895ec	[Misc] Show AMD GPU topology in `collect_env.py` (#8649 )	2024-09-20 13:52:19 -07:00
saumya-saran	b28298f2f4	[Bugfix] Validate SamplingParam n is an int (#8548 )	2024-09-20 12:46:02 -07:00
Alexey Kondratiev(AMD)	2940afa04e	[CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (#8670 )	2024-09-20 10:27:44 -07:00
Niklas Muennighoff	3b63de9353	[Model] Add OLMoE (#7922 )	2024-09-20 09:31:41 -07:00
Jiaxin Shan	260d40b5ea	[Core] Support Lora lineage and base model metadata management (#6315 )	2024-09-20 06:20:56 +00:00
William Lin	9e5ec35b1f	[bugfix] [AMD] add multi-step advance_step to ROCmFlashAttentionMetadata (#8474 )	2024-09-19 20:49:54 -07:00
Amit Garg	18ae428a0d	[Bugfix] Fix Phi3.5 mini and MoE LoRA inference (#8571 )	2024-09-20 08:54:02 +08:00
bnellnm	de6f90a13d	[Misc] guard against change in cuda library name (#8609 )	2024-09-20 06:36:30 +08:00
Alexey Kondratiev(AMD)	6cb748e190	[CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (#8551 )	2024-09-19 13:06:32 -07:00
Simon Mo	9e99407e3c	Create SECURITY.md (#8642 )	2024-09-19 12:16:28 -07:00
Isotr0py	ea4647b7d7	[Doc] Add documentation for GGUF quantization (#8618 )	2024-09-19 13:15:55 -06:00
盏一	e42c634acb	[Core] simplify logits resort in _apply_top_k_top_p (#8619 )	2024-09-19 18:28:25 +00:00
Charlie Fu	9cc373f390	[Kernel][Amd] Add fp8 kv cache support for rocm custom paged attention (#8577 )	2024-09-19 17:37:57 +00:00
Nick Hill	76515f303b	[Frontend] Use MQLLMEngine for embeddings models too (#8584 )	2024-09-19 12:51:06 -04:00
Kunshang Ji	855c8ae2c9	[MISC] remove engine_use_ray in benchmark_throughput.py (#8615 )	2024-09-18 22:33:20 -07:00
Kuntai Du	c52ec5f034	[Bugfix] fixing sonnet benchmark bug in benchmark_serving.py (#8616 )	2024-09-19 05:24:24 +00:00

1 2 3 4 5 ...

2753 Commits