Lucas Wilkinson
|
a8d604ca2a
|
[Misc] Disambiguate quantized types via a new ScalarType (#6396)
|
2024-08-02 13:51:58 -07:00 |
|
Michael Goin
|
b482b9a5b1
|
[CI/Build] Add support for Python 3.12 (#7035)
|
2024-08-02 13:51:22 -07:00 |
|
youkaichao
|
806949514a
|
[ci] set timeout for test_oot_registration.py (#7082)
|
2024-08-02 10:03:24 -07:00 |
|
Jie Fu (傅杰)
|
c16eaac500
|
[Hardware][Intel CPU] Update torch 2.4.0 for CPU backend (#6931)
|
2024-08-02 08:55:58 -07:00 |
|
Peng Guanwen
|
db35186391
|
[Core] Comment out unused code in sampler (#7023)
|
2024-08-02 00:58:26 -07:00 |
|
youkaichao
|
660dea1235
|
[cuda][misc] remove error_on_invalid_device_count_status (#7069)
|
2024-08-02 00:14:21 -07:00 |
|
Bongwon Jang
|
cf2a1a4d9d
|
Fix tracing.py (#7065)
|
2024-08-01 23:28:00 -07:00 |
|
youkaichao
|
252357793d
|
[ci][distributed] try to fix pp test (#7054)
|
2024-08-01 22:03:12 -07:00 |
|
Cyrus Leung
|
3bb4b1e4cd
|
[mypy] Speed up mypy checking (#7056)
|
2024-08-01 19:49:43 -07:00 |
|
Lily Liu
|
954f7305a1
|
[Kernel] Fix input for flashinfer prefill wrapper. (#7008)
|
2024-08-01 18:44:16 -07:00 |
|
Woosuk Kwon
|
6ce01f3066
|
[Performance] Optimize get_seqs (#7051)
|
2024-08-01 18:29:52 -07:00 |
|
Tyler Michael Smith
|
6a11fdfbb8
|
[CI/Build][Bugfix] Fix CUTLASS header-only line (#7034)
|
2024-08-01 13:51:15 -07:00 |
|
Woosuk Kwon
|
805a8a75f2
|
[Misc] Support attention logits soft-capping with flash-attn (#7022)
|
2024-08-01 13:14:37 -07:00 |
|
omkar kakarparthi
|
562e580abc
|
Update run-amd-test.sh (#7044)
|
2024-08-01 13:12:37 -07:00 |
|
Murali Andoorveedu
|
fc912e0886
|
[Models] Support Qwen model with PP (#6974)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-08-01 12:40:43 -07:00 |
|
Michael Goin
|
f4fd390f5d
|
[Bugfix] Lower gemma's unloaded_params exception to warning (#7002)
|
2024-08-01 12:01:07 -07:00 |
|
Michael Goin
|
fb3db61688
|
[CI/Build] Remove sparseml requirement from testing (#7037)
|
2024-08-01 12:00:51 -07:00 |
|
Isotr0py
|
2dd34371a6
|
[Bugfix] Fix RMSNorm forward in InternViT attention qk_layernorm (#6992)
|
2024-08-01 12:00:28 -07:00 |
|
Sage Moore
|
7e0861bd0b
|
[CI/Build] Update PyTorch to 2.4.0 (#6951)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-01 11:11:24 -07:00 |
|
Alexei-V-Ivanov-AMD
|
a72a424b3e
|
[Build/CI] Fixing Docker Hub quota issue. (#7043)
|
2024-08-01 11:07:37 -07:00 |
|
youkaichao
|
c8a7e93273
|
[core][scheduler] simplify and improve scheduler (#6867)
|
2024-07-31 23:51:09 -07:00 |
|
zifeitong
|
3c10591ef2
|
[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user (#6954)
|
2024-07-31 21:13:34 -07:00 |
|
Aurick Qiao
|
0437492ea9
|
PP comm optimization: replace send with partial send + allgather (#6695)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
|
2024-07-31 20:15:42 -07:00 |
|
Travis Johnson
|
630dd9e0ae
|
[Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings (#6758)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-07-31 19:49:11 -07:00 |
|
Woosuk Kwon
|
23993a7997
|
[Bugfix][TPU] Do not use torch.Generator for TPUs (#6981)
|
2024-07-31 18:50:28 -07:00 |
|
xuyi
|
1d2e7fb73f
|
[Model] Pipeline parallel support for Qwen2 (#6924)
|
2024-07-31 18:49:51 -07:00 |
|
Jee Jee Li
|
7ecee34321
|
[Kernel][RFC] Refactor the punica kernel based on Triton (#5036)
|
2024-07-31 17:12:24 -07:00 |
|
Simon Mo
|
7eb0cb4a14
|
Revert "[Frontend] Factor out code for running uvicorn" (#7012)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-07-31 16:34:26 -07:00 |
|
Michael Goin
|
a0dce9383a
|
[Misc] Add compressed-tensors to optimized quant list (#7006)
|
2024-07-31 14:40:44 -07:00 |
|
Varun Sundar Rabindranath
|
35e9c12bfa
|
[Kernel] Tuned int8 Cutlass Kernels for SM75 (T4) (#6996)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-07-31 14:40:32 -07:00 |
|
Varun Sundar Rabindranath
|
93548eb37e
|
[Kernel] Enable FP8 Cutlass for Ada Lovelace (#6950)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-07-31 14:40:22 -07:00 |
|
Michael Goin
|
460c1884e3
|
[Bugfix] Support cpu offloading with fp8 quantization (#6960)
|
2024-07-31 12:47:46 -07:00 |
|
Cody Yu
|
bd70013407
|
[MISC] Introduce pipeline parallelism partition strategies (#6920)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-07-31 12:02:17 -07:00 |
|
Avshalom Manevich
|
2ee8d3ba55
|
[Model] use FusedMoE layer in Jamba (#6935)
|
2024-07-31 12:00:24 -07:00 |
|
Cyrus Leung
|
daed30c4a9
|
[Bugfix] Fix feature size calculation for LLaVA-NeXT (#6982)
|
2024-07-31 23:46:17 +08:00 |
|
Alphi
|
2f4e108f75
|
[Bugfix] Clean up MiniCPM-V (#6939)
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-07-31 14:39:19 +00:00 |
|
HandH1998
|
6512937de1
|
Support W4A8 quantization for vllm (#5218)
|
2024-07-31 07:55:21 -06:00 |
|
Fei
|
c0644cf9ce
|
[Bugfix] fix logit processor excceed vocab size issue (#6927)
|
2024-07-31 16:16:01 +08:00 |
|
Woosuk Kwon
|
533d1932d2
|
[Bugfix][TPU] Set readonly=True for non-root devices (#6980)
|
2024-07-31 00:19:28 -07:00 |
|
Cyrus Leung
|
9f0e69b653
|
[CI/Build] Fix mypy errors (#6968)
|
2024-07-30 19:49:48 -07:00 |
|
Cyrus Leung
|
f230cc2ca6
|
[Bugfix] Fix broadcasting logic for multi_modal_kwargs (#6836)
|
2024-07-31 10:38:45 +08:00 |
|
Cyrus Leung
|
da1f7cc12a
|
[mypy] Enable following imports for some directories (#6681)
|
2024-07-31 10:38:03 +08:00 |
|
Cade Daniel
|
c32ab8be1a
|
[Speculative decoding] Add serving benchmark for llama3 70b + speculative decoding (#6964)
|
2024-07-31 00:53:21 +00:00 |
|
Cade Daniel
|
fb4f530bf5
|
[CI] [nightly benchmark] Do not re-download sharegpt dataset if exists (#6706)
|
2024-07-30 16:28:49 -07:00 |
|
Cade Daniel
|
79319cedfa
|
[Nightly benchmarking suite] Remove pkill python from run benchmark suite (#6965)
|
2024-07-30 16:28:05 -07:00 |
|
Simon Mo
|
40c27a7cbb
|
[Build] Temporarily Disable Kernels and LoRA tests (#6961)
|
2024-07-30 14:59:48 -07:00 |
|
youkaichao
|
6ca8031e71
|
[core][misc] improve free_finished_seq_groups (#6865)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-07-30 14:32:12 -07:00 |
|
Tyler Michael Smith
|
d7a299edaa
|
[Kernel] Remove scaled_fp8_quant kernel padding footgun (#6842)
|
2024-07-30 16:37:01 -04:00 |
|
Sanger Steel
|
052b6f8ca4
|
[Bugfix] Fix tensorizer memory profiling bug during testing (#6881)
|
2024-07-30 11:48:50 -07:00 |
|
Ilya Lavrenov
|
5895b24677
|
[OpenVINO] Updated OpenVINO requirements and build docs (#6948)
|
2024-07-30 11:33:01 -07:00 |
|