jon-chuang
|
50b8d08dbd
|
[Misc/Testing] Use torch.testing.assert_close (#7324)
|
2024-08-16 04:24:04 +00:00 |
|
Qubitium-ModelCloud
|
ee93f4f92a
|
[CORE] Quantized lm-head Framework (#4442)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: ZX <zx@lbx.dev>
|
2024-07-02 22:25:17 +00:00 |
|
Woosuk Kwon
|
190bc838e1
|
[Misc] Remove unnecessary ModelRunner imports (#4703)
|
2024-05-09 00:17:17 -07:00 |
|
SangBin Cho
|
3521ba4f25
|
[Core][Model runner refactoring 1/N] Refactor attn metadata term (#4518)
|
2024-05-03 10:20:12 -07:00 |
|
SangBin Cho
|
603ad84815
|
[Core] Refactoring sampler and support prompt logprob for chunked prefill (#4309)
|
2024-04-26 13:02:02 +00:00 |
|
Antoni Baum
|
69e1d2fb69
|
[Core] Refactor model loading code (#4097)
|
2024-04-16 11:34:39 -07:00 |
|
Roy
|
f1c0fc3919
|
Migrate logits computation and gather to model_runner (#3233)
|
2024-03-20 23:25:01 +00:00 |
|