20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
omrishiv	7c8566aa4f	[Doc] neuron documentation update (#8671 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-09-20 15:04:37 -07:00
youkaichao	fa0c114fad	[doc] improve installation doc (#8550 ) Co-authored-by: Andy Dai <76841985+Imss27@users.noreply.github.com>	2024-09-17 16:24:06 -07:00
youkaichao	2759a43a26	[doc] update doc on testing and debugging (#8514 )	2024-09-16 12:10:23 -07:00
Isotr0py	f57092c00b	[Doc] Add oneDNN installation to CPU backend documentation (#8467 )	2024-09-13 18:06:30 +00:00
youkaichao	cab69a15e4	[doc] recommend pip instead of conda (#8446 )	2024-09-12 23:52:41 -07:00
Cyrus Leung	288a938872	[Doc] Indicate more information about supported modalities (#8181 )	2024-09-05 10:51:53 +00:00
Woosuk Kwon	61f4a93d14	[TPU][Bugfix] Use XLA rank for persistent cache path (#8137 )	2024-09-03 18:35:33 -07:00
Woosuk Kwon	eeffde1ac0	[TPU] Upgrade PyTorch XLA nightly (#7967 )	2024-08-28 13:10:21 -07:00
Ilya Lavrenov	398521ad19	[OpenVINO] Updated documentation (#7687 )	2024-08-20 07:33:56 -06:00
youkaichao	199adbb7cf	[doc] update test script to include cudagraph (#7501 )	2024-08-13 21:52:58 -07:00
Woosuk Kwon	a08df8322e	[TPU] Support multi-host inference (#7457 )	2024-08-13 16:31:20 -07:00
tomeras91	02b1988b9f	[Doc] building vLLM with VLLM_TARGET_DEVICE=empty (#7403 )	2024-08-11 14:38:17 -07:00
Woosuk Kwon	90bab18f24	[TPU] Use mark_dynamic to reduce compilation time (#7340 )	2024-08-10 18:12:22 -07:00
Ilya Lavrenov	80cbe10c59	[OpenVINO] migrate to latest dependencies versions (#7251 )	2024-08-07 09:49:10 -07:00
Simon Mo	4db5176d97	bump version to v0.5.4 (#7139 )	2024-08-05 14:39:48 -07:00
Michael Goin	b482b9a5b1	[CI/Build] Add support for Python 3.12 (#7035 )	2024-08-02 13:51:22 -07:00
Jee Jee Li	7ecee34321	[Kernel][RFC] Refactor the punica kernel based on Triton (#5036 )	2024-07-31 17:12:24 -07:00
Ilya Lavrenov	5895b24677	[OpenVINO] Updated OpenVINO requirements and build docs (#6948 )	2024-07-30 11:33:01 -07:00
Woosuk Kwon	fad5576c58	[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856 )	2024-07-27 10:28:33 -07:00
omrishiv	3c3012398e	[Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron (#6844 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-07-26 20:20:16 -07:00
Woosuk Kwon	ced36cd89b	[ROCm] Upgrade PyTorch nightly version (#6845 )	2024-07-26 20:16:13 -07:00
Li, Jiang	3bbb4936dc	[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125 )	2024-07-26 13:50:10 -07:00
youkaichao	85ad7e2d01	[doc][debugging] add known issues for hangs (#6816 )	2024-07-25 21:48:05 -07:00
Hongxia Yang	d88c458f44	[Doc][AMD][ROCm]Added tips to refer to mi300x tuning guide for mi300x users (#6754 )	2024-07-24 14:32:57 -07:00
Woosuk Kwon	ccc4a73257	[Docs][ROCm] Detailed instructions to build from source (#6680 )	2024-07-24 01:07:23 -07:00
Matt Wong	06d6c5fe9f	[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543 )	2024-07-20 09:39:07 -07:00
Simon Mo	30efe41532	[Docs] Update docs for wheel location (#6580 )	2024-07-19 12:14:11 -07:00
Cyrus Leung	5bf35a91e4	[Doc][CI/Build] Update docs and tests to use `vllm serve` (#6431 )	2024-07-17 07:43:21 +00:00
Hongxia Yang	10383887e0	[ROCm] Cleanup Dockerfile and remove outdated patch (#6482 )	2024-07-16 22:47:02 -07:00
Woosuk Kwon	c467dff24f	[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457 )	2024-07-16 09:56:28 -07:00
youkaichao	9f4ccec761	[doc][misc] remind to cancel debugging environment variables (#6481 ) [doc][misc] remind users to cancel debugging environment variables after debugging (#6481)	2024-07-16 09:45:30 -07:00
youkaichao	22e79ee8f3	[doc][misc] doc update (#6439 )	2024-07-14 23:33:25 -07:00
Robert Cohn	61e85dbad8	[Doc] xpu backend requires running setvars.sh (#6393 )	2024-07-14 17:10:11 -07:00
Simon Mo	d719ba24c5	Build some nightly wheels by default (#6380 )	2024-07-12 13:56:59 -07:00
Jie Fu (傅杰)	439c84581a	[Doc] Update description of vLLM support for CPUs (#6003 )	2024-07-10 21:15:29 -07:00
youkaichao	966fe72141	[doc][misc] bump up py version in installation doc (#6119 )	2024-07-03 15:52:04 -07:00
Ilya Lavrenov	57f09a419c	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
Matt Wong	dd793d1de5	[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422 )	2024-06-25 15:56:15 -07:00
youkaichao	c18ebfdd71	[doc][distributed] add both gloo and nccl tests (#5834 )	2024-06-25 15:10:28 -04:00
Woosuk Kwon	8c00f9c15d	[Docs][TPU] Add installation tip for TPU (#5761 )	2024-06-21 23:09:40 -07:00
Kunshang Ji	728c4c8a06	[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814 ) Co-authored-by: Jiang Li <jiang1.li@intel.com> Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com> Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-06-17 11:01:25 -07:00
youkaichao	845a3f26f9	[Doc] add debugging tips for crash and multi-node debugging (#5581 )	2024-06-17 10:08:01 +08:00
Li, Jiang	80aa7e91fc	[Hardware][Intel] Optimize CPU backend and add more performance tips (#4971 ) Co-authored-by: Jianan Gu <jianan.gu@intel.com>	2024-06-13 09:33:14 -07:00
Cyrus Leung	b8d4dfff9c	[Doc] Update debug docs (#5438 )	2024-06-12 14:49:31 -07:00
Woosuk Kwon	1a8bfd92d5	[Hardware] Initial TPU integration (#5292 )	2024-06-12 11:53:03 -07:00
youkaichao	8f89d72090	[Doc] add common case for long waiting time (#5430 )	2024-06-11 11:12:13 -07:00
youkaichao	d8f31f2f8b	[Doc] add debugging tips (#5409 )	2024-06-10 23:21:43 -07:00
Jie Fu (傅杰)	87d5abef75	[Bugfix] Fix a bug caused by pip install setuptools>=49.4.0 for CPU backend (#5249 )	2024-06-04 09:57:51 -07:00
youkaichao	6a50f4cafa	[Doc] add ccache guide in doc (#5012 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-05-23 23:21:54 +00:00
fuchen.ljl	ee37328da0	Unable to find Punica extension issue during source code installation (#4494 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-05-01 00:42:09 +00:00

1 2

86 Commits