CHU Tianxiang
|
0fbfc4b81b
|
Add GPTQ support (#916)
|
2023-12-15 03:04:22 -08:00 |
|
Woosuk Kwon
|
5dd80d3777
|
Fix latency benchmark script (#2035)
|
2023-12-11 11:19:08 -08:00 |
|
Antoni Baum
|
05ff90b692
|
Save pytorch profiler output for latency benchmark (#1871)
* Save profiler output
* Apply feedback from code review
|
2023-12-05 20:55:55 -08:00 |
|
Woosuk Kwon
|
51d3cb951d
|
Remove max_num_seqs in latency benchmark script (#1855)
|
2023-11-30 00:00:32 -08:00 |
|
Woosuk Kwon
|
e74b1736a1
|
Add profile option to latency benchmark script (#1839)
|
2023-11-29 23:42:52 -08:00 |
|
chooper1
|
1f24755bf8
|
Support SqueezeLLM (#1326)
Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-10-21 23:14:59 -07:00 |
|
Antoni Baum
|
acbed3ef40
|
Use monotonic time where appropriate (#1249)
|
2023-10-02 19:22:05 -07:00 |
|
kg6-sleipnir
|
b5a10eb0ef
|
Added dtype arg to benchmarks (#1228)
|
2023-09-30 21:04:03 -07:00 |
|
Woosuk Kwon
|
e3e79e9e8a
|
Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
|
2023-09-16 00:03:37 -07:00 |
|
Ricardo Lu
|
8c4b2592fb
|
fix: enable trust-remote-code in api server & benchmark. (#509)
|
2023-07-19 17:06:15 -07:00 |
|
Woosuk Kwon
|
4338cc4750
|
[Tokenizer] Add an option to specify tokenizer (#284)
|
2023-06-28 09:46:58 -07:00 |
|
Woosuk Kwon
|
0b98ba15c7
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|
Zhuohan Li
|
e5464ee484
|
Rename servers to engines (#152)
|
2023-06-17 17:25:21 +08:00 |
|
Woosuk Kwon
|
311490a720
|
Add script for benchmarking serving throughput (#145)
|
2023-06-14 19:55:38 -07:00 |
|
Woosuk Kwon
|
8274ca23ac
|
Add docstrings for LLM (#137)
|
2023-06-04 12:52:41 -07:00 |
|
Woosuk Kwon
|
211318d44a
|
Add throughput benchmarking script (#133)
|
2023-05-28 03:20:05 -07:00 |
|