270 Commits

Author SHA1 Message Date
Woosuk Kwon
e3e79e9e8a
Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
2023-09-16 00:03:37 -07:00
Jasmond L
ab019eea75
Add Model Revision Support (#1014)
Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-09-13 15:20:02 -07:00
Antoni Baum
0bb1e885a0
Make max_model_len configurable (#972) 2023-09-12 16:29:19 -07:00
Kyujin Cho
898285c9bf
fix: CUDA error when inferencing with Falcon-40B base model (#992) 2023-09-10 01:39:02 -07:00
Zhuohan Li
c957c741d9
Enable safetensors loading for all models (#974) 2023-09-07 15:49:52 -07:00
Wen Sun
621980bdc0
fix: incorrect bigcode attention heads num (#676) 2023-08-04 10:35:22 -07:00
Zhuohan Li
1b0bd0fe8a
Add Falcon support (new) (#592) 2023-08-02 14:04:39 -07:00
Chaofan Lin
aa39e42c5a
fix doc (#622) 2023-07-31 13:11:57 -07:00
Zhuohan Li
58a072be15
[Fix] Add model sequence length into model config (#575) 2023-07-25 23:46:30 -07:00
Zhuohan Li
6fc2a38b11
Add support for LLaMA-2 (#505) 2023-07-20 11:38:27 -07:00
Lily Liu
b4b195b360
fix max seq len (#489) 2023-07-17 23:20:20 -07:00
Zhuohan Li
96853af5a8
Optimize MQA Kernel (#452) 2023-07-14 20:06:40 -04:00
Woosuk Kwon
ddfdf470ae
Add trust_remote_code arg to get_config (#405) 2023-07-08 15:24:17 -07:00
codethazine
a945fcc2ae
Add trust-remote-code flag to handle remote tokenizers (#364) 2023-07-07 11:04:58 -07:00
Woosuk Kwon
404422f42e
[Model] Add support for MPT (#334) 2023-07-03 16:47:53 -07:00
Zhuohan Li
d6fa1be3a8
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
Lily Liu
dafd924c1f
Raise error for long prompt (#273) 2023-06-30 18:48:49 -07:00
Woosuk Kwon
998d9d1509
[Tokenizer] Add tokenizer mode (#298) 2023-06-28 14:19:22 -07:00
Woosuk Kwon
4338cc4750
[Tokenizer] Add an option to specify tokenizer (#284) 2023-06-28 09:46:58 -07:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00