20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
youkaichao	5b8a7c1cb0	[Misc] centralize all usage of environment variables (#4548 )	2024-05-02 11:13:25 -07:00
Ruoyu Qin	dfea173148	[Bugfix] Abort requests when the connection to /v1/completions is interrupted (#4363 )	2024-04-27 09:48:37 -07:00
SangBin Cho	a88081bf76	[CI] Disable non-lazy string operation on logging (#4326 ) Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>	2024-04-26 00:16:58 -07:00
Nick Hill	efffb63f58	[Core] Move function tracing setup to util function (#4352 )	2024-04-25 16:45:12 -07:00
youkaichao	c1b4e4157c	[Core][Distributed] use absolute path for library file (#4271 )	2024-04-22 17:21:48 -07:00
youkaichao	8a7a3e4436	[Core] add an option to log every function call to for debugging hang/crash in distributed inference (#4079 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-04-18 16:15:12 -07:00
youkaichao	6dc1fc9cfe	[Core] nccl integrity check and test (#4155 ) [Core] Add integrity check during initialization; add test for it (#4155)	2024-04-17 22:28:52 -07:00
youkaichao	8438e0569e	[Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024 ) [Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (#4024)	2024-04-17 08:34:33 +00:00
SangBin Cho	37e84a403d	[Typing] Fix Sequence type GenericAlias only available after Python 3.9. (#4092 )	2024-04-15 14:47:31 -07:00
SangBin Cho	09473ee41c	[mypy] Add mypy type annotation part 1 (#4006 )	2024-04-12 14:35:50 -07:00
Cyrus Leung	7fd3949a0b	[Frontend][Core] Move `merge_async_iterators` to utils (#4026 )	2024-04-12 05:30:54 +00:00
Michael Feil	c2b4a1bce9	[Doc] Add typing hints / mypy types cleanup (#3816 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-11 17:17:21 -07:00
bigPYJ1151	8afca50889	[Hardware][Intel] Isolate CPUModelRunner and ModelRunner for better maintenance (#3824 )	2024-04-11 11:56:49 -07:00
Kunshang Ji	e9da5a40c6	[Misc] Add indirection layer for custom ops (#3913 )	2024-04-10 20:26:07 -07:00
zhaotyer	c2e00af523	[Bugfix] fix utils.py/merge_dict func TypeError: 'type' object is not subscriptable (#3955 ) Co-authored-by: tianyi_zhao <tianyi.zhao@transwarp.io>	2024-04-10 04:49:11 +00:00
Adrian Abeyta	2ff767b513	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 ) Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com> Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by: guofangze <guofangze@kuaishou.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-04-03 14:15:55 -07:00
SangBin Cho	3dcb3e8b98	[3/N] Refactor scheduler for chunked prefill scheduling (#3550 )	2024-04-03 14:13:49 -07:00
Nick Hill	c9b506dad4	[BugFix] Use different mechanism to get vllm version in `is_cpu()` (#3804 )	2024-04-02 23:06:25 -07:00
bigPYJ1151	0e3f06fe9c	[Hardware][Intel] Add CPU inference backend (#3634 ) Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>	2024-04-01 22:07:30 -07:00
Cade Daniel	14ccd94c89	[Core][Bugfix]Refactor block manager for better testability (#3492 )	2024-03-27 23:59:28 -07:00
Nick Hill	0dc72273b8	[BugFix] Fix ipv4 address parsing regression (#3645 )	2024-03-26 14:39:44 -07:00
liiliiliil	a979d9771e	[Bugfix] Fix ipv6 address parsing bug (#3641 )	2024-03-26 11:58:20 -07:00
xwjiang2010	64172a976c	[Feature] Add vision language model support. (#3042 )	2024-03-25 14:16:30 -07:00
SangBin Cho	01bfb22b41	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
Zhuohan Li	e90fc21f2e	[Hardware][Neuron] Refactor neuron support (#3471 )	2024-03-22 01:22:17 +00:00
Nick Hill	4ad521d8b5	[Core] Add generic typing to `LRUCache` (#3511 )	2024-03-20 00:36:09 -07:00
Simon Mo	20478c4d3a	Use lru_cache for some environment detection utils (#3508 )	2024-03-19 21:34:15 +00:00
Simon Mo	cc63d03fbb	Revert "[Core] Cache some utils" (#3507 )	2024-03-19 13:22:58 -07:00
Antoni Baum	b37cdce2b1	[Core] Cache some utils (#3474 )	2024-03-18 17:14:26 -07:00
youkaichao	b522c4476f	[Misc] add HOST_IP env var (#3419 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-03-14 21:32:52 -07:00
Zhuohan Li	2f8844ba08	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
Michael Goin	385da2dae2	Measure model memory usage (#3120 )	2024-03-07 11:42:42 -08:00
ttbachyinsda	76e8a70476	[Minor fix] The domain dns.google may cause a socket.gaierror exception (#3176 ) Co-authored-by: guofangze <guofangze@kuaishou.com>	2024-03-04 19:17:12 +00:00
Liangfu Chen	3b7178cfa4	[Neuron] Support inference with transformers-neuronx (#2569 )	2024-02-28 09:34:34 -08:00
Jingru	2410e320b3	fix `get_ip` error in pure ipv6 environment (#2931 )	2024-02-26 19:22:16 -08:00
zhaoyang-star	57f044945f	Fix nvcc not found in vlm-openai image (#2781 )	2024-02-22 14:25:07 -08:00
Massimiliano Pronesti	93dc5a2870	chore(vllm): codespell for spell checking (#2820 )	2024-02-21 18:56:01 -08:00
Lily Liu	fe6d09ae61	[Minor] More fix of test_cache.py CI test failure (#2750 )	2024-02-06 11:38:38 -08:00
Kunshang Ji	96b6f475dd	Remove hardcoded `device="cuda"` to support more devices (#2503 ) Co-authored-by: Jiang Li <jiang1.li@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2024-02-01 15:46:39 -08:00
zhaoyang-star	9090bf02e7	Support FP8-E5M2 KV Cache (#2279 ) Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-01-28 16:43:54 -08:00
Hongxia Yang	6b7de1a030	[ROCm] add support to ROCm 6.0 and MI300 (#2274 )	2024-01-26 12:41:10 -08:00
Antoni Baum	9b945daaf1	[Experimental] Add multi-LoRA support (#1804 ) Co-authored-by: Chen Shen <scv119@gmail.com> Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by: Avnish Narayan <avnish@anyscale.com>	2024-01-23 15:26:37 -08:00
Cade Daniel	18bfcdd05c	[Speculative decoding 2/9] Multi-step worker for draft model (#2424 )	2024-01-21 16:31:47 -08:00
Liangfu Chen	18473cf498	[Neuron] Add an option to build with neuron (#2065 )	2024-01-18 10:58:50 -08:00
Yunfeng Bai	4b61c6b669	`get_ip()`: Fix ipv4 ipv6 dualstack (#2408 )	2024-01-10 11:39:58 -08:00
Zhuohan Li	fd4ea8ef5c	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
Woosuk Kwon	37ca558103	Optimize model execution with CUDA graph (#1926 ) Co-authored-by: Chen Shen <scv119@gmail.com> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2023-12-16 21:12:08 -08:00
Woosuk Kwon	30bad5c492	Fix peak memory profiling (#2031 )	2023-12-12 22:01:53 -08:00
TJian	6ccc0bfffb	Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 ) Co-authored-by: Philipp Moritz <pcmoritz@gmail.com> Co-authored-by: Amir Balwel <amoooori04@gmail.com> Co-authored-by: root <kuanfu.liu@akirakan.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com> Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>	2023-12-07 23:16:52 -08:00
Yanming W	e0c6f556e8	[Build] Avoid building too many extensions (#1624 )	2023-11-23 16:31:19 -08:00

1 2

55 Commits