9 Commits

Author SHA1 Message Date
Yanming W
e0c6f556e8
[Build] Avoid building too many extensions (#1624) 2023-11-23 16:31:19 -08:00
Zhuohan Li
ba0bfd40e2
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-02 15:36:09 -07:00
Woosuk Kwon
e67b4f2c2a
Use FP32 in RoPE initialization (#1004)
Co-authored-by: One <imone@tuta.io>
2023-09-11 00:26:35 -07:00
Woosuk Kwon
320a622ec4
[BugFix] Implement RoPE for GPT-J (#941) 2023-09-06 11:54:33 +09:00
Woosuk Kwon
fbd80ad409
Clean up kernel unit tests (#938) 2023-09-05 16:57:38 -07:00
Zhuohan Li
d6fa1be3a8
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
Woosuk Kwon
a283ec2eec
Add contributing guideline and mypy config (#122) 2023-05-23 17:58:51 -07:00
Woosuk Kwon
825d8892b5
Use pytest format for unit tests (#107) 2023-05-17 17:11:23 -07:00