2753 Commits

Author SHA1 Message Date
Lucas Wilkinson
86e9c8df29
[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701)
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-09-23 13:46:26 -04:00
Daniele
ee5f34b1c2
[CI/Build] use setuptools-scm to set __version__ (#4738)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-23 09:44:26 -07:00
Jani Monoses
f2bd246c17
[VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size (#8707) 2024-09-23 14:43:09 +00:00
Yanyi Liu
a79e522984
[Model] Support pp for qwen2-vl (#8696) 2024-09-23 13:46:59 +00:00
Li, Jiang
3e83c12b5c
[Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model_runner (#8733) 2024-09-23 13:15:16 +00:00
Isotr0py
e551ca1555
[Hardware][CPU] Refactor CPU model runner (#8729) 2024-09-23 20:12:20 +08:00
Alex Brooks
9b8c8ba119
[Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-09-23 07:44:48 +00:00
Yan Ma
d23679eb99
[Bugfix] fix docker build for xpu (#8652) 2024-09-22 22:54:18 -07:00
Luka Govedič
57a0702e63
[Bugfix] Fix CPU CMake build (#8723)
Co-authored-by: Yuan <yuan.zhou@intel.com>
2024-09-22 20:40:46 -07:00
Tyler Michael Smith
3dda7c2250
[Bugfix] Avoid some bogus messages RE CUTLASS's revision when building (#8702) 2024-09-22 22:24:59 -04:00
youkaichao
92ba7e7477
[misc] upgrade mistral-common (#8715) 2024-09-22 15:41:59 -07:00
youkaichao
d4a2ac8302
[build] enable existing pytorch (for GH200, aarch64, nightly) (#8713) 2024-09-22 12:47:54 -07:00
Lily Liu
c6bd70d772
[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701) 2024-09-22 12:34:14 -07:00
litianjian
5b59532760
[Model][VLM] Add LLaVA-Onevision model support (#8486)
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-22 10:51:44 -07:00
Huazhong Ji
ca2b628b3c
[MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler (#8703) 2024-09-22 10:44:09 -07:00
Alex Brooks
8ca5051b9a
[Misc] Use NamedTuple in Multi-image example (#8705)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-09-22 20:56:20 +08:00
Cyrus Leung
06ed2815e2
[Model] Refactor BLIP/BLIP-2 to support composite model loading (#8407) 2024-09-22 12:24:21 +00:00
youkaichao
0e40ac9b7b
[ci][build] fix vllm-flash-attn (#8699) 2024-09-21 23:24:58 -07:00
Isotr0py
13d88d4137
[Bugfix] Refactor composite weight loading logic (#8656) 2024-09-22 04:33:27 +00:00
Tyler Michael Smith
d66ac62854
[Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (#8643) 2024-09-21 23:45:02 +00:00
Divakar Verma
9dc7c6c7f3
[dbrx] refactor dbrx experts to extend FusedMoe class (#8518) 2024-09-21 15:09:39 -06:00
rasmith
ec4aaad812
[Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x speedup MI300, minor improvement for MI250 (#8646) 2024-09-21 09:20:54 +00:00
Andy Dai
4dfdf43196
[Doc] Fix typo in AMD installation guide (#8689) 2024-09-21 00:24:12 -07:00
Cyrus Leung
5e85f4f82a
[VLM] Use SequenceData.from_token_counts to create dummy data (#8687) 2024-09-20 23:28:56 -07:00
Luka Govedič
71c60491f2
[Kernel] Build flash-attn from source (#8245) 2024-09-20 23:27:10 -07:00
youkaichao
0faab90eb0
[beam search] add output for manually checking the correctness (#8684) 2024-09-20 19:55:33 -07:00
Cyrus Leung
0455c46ed4
[Core] Factor out common code in SequenceData and Sequence (#8675) 2024-09-21 02:30:39 +00:00
Kunshang Ji
d4bf085ad0
[MISC] add support custom_op check (#8557)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-20 19:03:55 -07:00
Cyrus Leung
0057894ef7
[Core] Rename PromptInputs and inputs(#8673) 2024-09-20 19:00:54 -07:00
zyddnys
0f961b3ce9
[Bugfix] Fix incorrect llava next feature size calculation (#8496) 2024-09-20 22:48:32 +00:00
omrishiv
7f9c8902e3
[Hardware][AWS] update neuron to 2.20 (#8676)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-09-20 15:19:44 -07:00
omrishiv
7c8566aa4f
[Doc] neuron documentation update (#8671)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-09-20 15:04:37 -07:00
Patrick von Platen
b4e4eda92e
[Bugfix][Core] Fix tekken edge case for mistral tokenizer (#8640) 2024-09-20 14:33:03 -07:00
Pastel!
2874bac618
[Bugfix] Config got an unexpected keyword argument 'engine' (#8556) 2024-09-20 14:00:45 -07:00
Cyrus Leung
035fa895ec
[Misc] Show AMD GPU topology in collect_env.py (#8649) 2024-09-20 13:52:19 -07:00
saumya-saran
b28298f2f4
[Bugfix] Validate SamplingParam n is an int (#8548) 2024-09-20 12:46:02 -07:00
Alexey Kondratiev(AMD)
2940afa04e
[CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (#8670) 2024-09-20 10:27:44 -07:00
Niklas Muennighoff
3b63de9353
[Model] Add OLMoE (#7922) 2024-09-20 09:31:41 -07:00
Jiaxin Shan
260d40b5ea
[Core] Support Lora lineage and base model metadata management (#6315) 2024-09-20 06:20:56 +00:00
William Lin
9e5ec35b1f
[bugfix] [AMD] add multi-step advance_step to ROCmFlashAttentionMetadata (#8474) 2024-09-19 20:49:54 -07:00
Amit Garg
18ae428a0d
[Bugfix] Fix Phi3.5 mini and MoE LoRA inference (#8571) 2024-09-20 08:54:02 +08:00
bnellnm
de6f90a13d
[Misc] guard against change in cuda library name (#8609) 2024-09-20 06:36:30 +08:00
Alexey Kondratiev(AMD)
6cb748e190
[CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (#8551) 2024-09-19 13:06:32 -07:00
Simon Mo
9e99407e3c
Create SECURITY.md (#8642) 2024-09-19 12:16:28 -07:00
Isotr0py
ea4647b7d7
[Doc] Add documentation for GGUF quantization (#8618) 2024-09-19 13:15:55 -06:00
盏一
e42c634acb
[Core] simplify logits resort in _apply_top_k_top_p (#8619) 2024-09-19 18:28:25 +00:00
Charlie Fu
9cc373f390
[Kernel][Amd] Add fp8 kv cache support for rocm custom paged attention (#8577) 2024-09-19 17:37:57 +00:00
Nick Hill
76515f303b
[Frontend] Use MQLLMEngine for embeddings models too (#8584) 2024-09-19 12:51:06 -04:00
Kunshang Ji
855c8ae2c9
[MISC] remove engine_use_ray in benchmark_throughput.py (#8615) 2024-09-18 22:33:20 -07:00
Kuntai Du
c52ec5f034
[Bugfix] fixing sonnet benchmark bug in benchmark_serving.py (#8616) 2024-09-19 05:24:24 +00:00