20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Isotr0py	13d88d4137	[Bugfix] Refactor composite weight loading logic (#8656 )	2024-09-22 04:33:27 +00:00
Tyler Michael Smith	d66ac62854	[Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (#8643 )	2024-09-21 23:45:02 +00:00
Divakar Verma	9dc7c6c7f3	[dbrx] refactor dbrx experts to extend FusedMoe class (#8518 )	2024-09-21 15:09:39 -06:00
rasmith	ec4aaad812	[Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x speedup MI300, minor improvement for MI250 (#8646 )	2024-09-21 09:20:54 +00:00
Andy Dai	4dfdf43196	[Doc] Fix typo in AMD installation guide (#8689 )	2024-09-21 00:24:12 -07:00
Cyrus Leung	5e85f4f82a	[VLM] Use `SequenceData.from_token_counts` to create dummy data (#8687 )	2024-09-20 23:28:56 -07:00
Luka Govedič	71c60491f2	[Kernel] Build flash-attn from source (#8245 )	2024-09-20 23:27:10 -07:00
youkaichao	0faab90eb0	[beam search] add output for manually checking the correctness (#8684 )	2024-09-20 19:55:33 -07:00
Cyrus Leung	0455c46ed4	[Core] Factor out common code in `SequenceData` and `Sequence` (#8675 )	2024-09-21 02:30:39 +00:00
Kunshang Ji	d4bf085ad0	[MISC] add support custom_op check (#8557 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-20 19:03:55 -07:00
Cyrus Leung	0057894ef7	[Core] Rename `PromptInputs` and `inputs`(#8673 )	2024-09-20 19:00:54 -07:00
zyddnys	0f961b3ce9	[Bugfix] Fix incorrect llava next feature size calculation (#8496 )	2024-09-20 22:48:32 +00:00
omrishiv	7f9c8902e3	[Hardware][AWS] update neuron to 2.20 (#8676 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-09-20 15:19:44 -07:00
omrishiv	7c8566aa4f	[Doc] neuron documentation update (#8671 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-09-20 15:04:37 -07:00
Patrick von Platen	b4e4eda92e	[Bugfix][Core] Fix tekken edge case for mistral tokenizer (#8640 )	2024-09-20 14:33:03 -07:00
Pastel！	2874bac618	[Bugfix] Config got an unexpected keyword argument 'engine' (#8556 )	2024-09-20 14:00:45 -07:00
Cyrus Leung	035fa895ec	[Misc] Show AMD GPU topology in `collect_env.py` (#8649 )	2024-09-20 13:52:19 -07:00
saumya-saran	b28298f2f4	[Bugfix] Validate SamplingParam n is an int (#8548 )	2024-09-20 12:46:02 -07:00
Alexey Kondratiev(AMD)	2940afa04e	[CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (#8670 )	2024-09-20 10:27:44 -07:00
Niklas Muennighoff	3b63de9353	[Model] Add OLMoE (#7922 )	2024-09-20 09:31:41 -07:00
Jiaxin Shan	260d40b5ea	[Core] Support Lora lineage and base model metadata management (#6315 )	2024-09-20 06:20:56 +00:00
William Lin	9e5ec35b1f	[bugfix] [AMD] add multi-step advance_step to ROCmFlashAttentionMetadata (#8474 )	2024-09-19 20:49:54 -07:00
Amit Garg	18ae428a0d	[Bugfix] Fix Phi3.5 mini and MoE LoRA inference (#8571 )	2024-09-20 08:54:02 +08:00
bnellnm	de6f90a13d	[Misc] guard against change in cuda library name (#8609 )	2024-09-20 06:36:30 +08:00
Alexey Kondratiev(AMD)	6cb748e190	[CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (#8551 )	2024-09-19 13:06:32 -07:00
Simon Mo	9e99407e3c	Create SECURITY.md (#8642 )	2024-09-19 12:16:28 -07:00
Isotr0py	ea4647b7d7	[Doc] Add documentation for GGUF quantization (#8618 )	2024-09-19 13:15:55 -06:00
盏一	e42c634acb	[Core] simplify logits resort in _apply_top_k_top_p (#8619 )	2024-09-19 18:28:25 +00:00
Charlie Fu	9cc373f390	[Kernel][Amd] Add fp8 kv cache support for rocm custom paged attention (#8577 )	2024-09-19 17:37:57 +00:00
Nick Hill	76515f303b	[Frontend] Use MQLLMEngine for embeddings models too (#8584 )	2024-09-19 12:51:06 -04:00
Kunshang Ji	855c8ae2c9	[MISC] remove engine_use_ray in benchmark_throughput.py (#8615 )	2024-09-18 22:33:20 -07:00
Kuntai Du	c52ec5f034	[Bugfix] fixing sonnet benchmark bug in benchmark_serving.py (#8616 )	2024-09-19 05:24:24 +00:00
Roger Wang	02c9afa2d0	Revert "[Misc][Bugfix] Disable guided decoding for mistral tokenizer" (#8593 )	2024-09-19 04:14:28 +00:00
sroy745	3118f63385	[Bugfix] [Encoder-Decoder] Bugfix for encoder specific metadata construction during decode of encoder-decoder models. (#8545 )	2024-09-19 02:24:15 +00:00
Tyler Michael Smith	4c34ce8916	[Kernel] Remove marlin moe templating on thread_m_blocks (#8573 ) Co-authored-by: lwilkinson@neuralmagic.com	2024-09-19 01:42:49 +00:00
Joe Runde	0d47bf3bf4	[Bugfix] add `dead_error` property to engine client (#8574 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-09-18 22:10:01 +00:00
Nick Hill	d9cd78eb71	[BugFix] Nonzero exit code if MQLLMEngine startup fails (#8572 )	2024-09-18 20:17:55 +00:00
Tyler Michael Smith	db9120cded	[Kernel] Change interface to Mamba selective_state_update for continuous batching (#8039 )	2024-09-18 20:05:06 +00:00
Gregory Shtrasberg	b3195bc9e4	[AMD][ROCm]Quantization methods on ROCm; Fix _scaled_mm call (#8380 ) Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-18 10:41:08 -07:00
Geun, Lim	e18749ff09	[Model] Support Solar Model (#8386 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-18 11:04:00 -06:00
Russell Bryant	d65798f78c	[Core] zmq: bind only to 127.0.0.1 for local-only usage (#8543 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-09-18 16:10:27 +00:00
afeldman-nm	a8c1d161a7	[Core] Prompt logprobs support in Multi-step (#8199 )	2024-09-18 08:38:43 -07:00
Alexander Matveev	7c7714d856	[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-09-18 13:56:58 +00:00
Aaron Pham	9d104b5beb	[CI/Build] Update Ruff version (#8469 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-18 11:00:56 +00:00
Cyrus Leung	6ffa3f314c	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
Jiaxin Shan	e351572900	[Misc] Add argument to disable FastAPI docs (#8554 )	2024-09-18 09:51:59 +00:00
Daniele	95965d31b6	[CI/Build] fix Dockerfile.cpu on podman (#8540 )	2024-09-18 10:49:53 +08:00
Tyler Michael Smith	8110e44529	[Kernel] Change interface to Mamba causal_conv1d_update for continuous batching (#8012 )	2024-09-17 23:44:27 +00:00
Alexey Kondratiev(AMD)	09deb4721f	[CI/Build] Excluding kernels/test_gguf.py from ROCm (#8520 )	2024-09-17 16:40:29 -07:00
youkaichao	fa0c114fad	[doc] improve installation doc (#8550 ) Co-authored-by: Andy Dai <76841985+Imss27@users.noreply.github.com>	2024-09-17 16:24:06 -07:00

1 2 3 4 5 ...

2735 Commits