jon-chuang
50b8d08dbd
[Misc/Testing] Use torch.testing.assert_close
( #7324 )
2024-08-16 04:24:04 +00:00
Joe
14dbd5a767
[Model] H2O Danube3-4b ( #6451 )
2024-07-26 20:47:50 -07:00
Cyrus Leung
0e9164b40a
[mypy] Enable type checking for test directory ( #5017 )
2024-06-15 04:45:31 +00:00
Woosuk Kwon
41ca62cf03
[Misc] Add CustomOp interface for device portability ( #5255 )
2024-06-05 09:18:19 -07:00
SnowDist
a22dea54d3
[Model] Support MAP-NEO model ( #5081 )
...
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-05-30 19:24:41 -07:00
Jinzhen Lin
33e0823de5
[Bugfix] fix rope error when load models with different dtypes ( #4835 )
2024-05-17 18:43:34 +09:00
Cyrus Leung
350f9e107f
[CI/Build] Move test_utils.py
to tests/utils.py
( #4425 )
...
Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time)
Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.
2024-05-13 23:50:09 +09:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
Terry
7e9bd08f60
Add batched RoPE kernel ( #3095 )
2024-03-13 13:45:26 -07:00
Hongxia Yang
56f738ae9b
[ROCm] Fix some kernels failed unit tests ( #2498 )
2024-02-05 14:25:36 -08:00
Kunshang Ji
96b6f475dd
Remove hardcoded device="cuda"
to support more devices ( #2503 )
...
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2024-02-01 15:46:39 -08:00
Jee Li
77af974b40
[FIX] Support non-zero CUDA devices in custom kernels ( #1959 )
2024-01-02 19:09:59 -08:00
Woosuk Kwon
9b294976a2
Add PyTorch-native implementation of custom layers ( #1898 )
2023-12-02 21:18:40 -08:00
Yanming W
e0c6f556e8
[Build] Avoid building too many extensions ( #1624 )
2023-11-23 16:31:19 -08:00
Zhuohan Li
ba0bfd40e2
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic ( #1181 )
2023-10-02 15:36:09 -07:00
Woosuk Kwon
e67b4f2c2a
Use FP32 in RoPE initialization ( #1004 )
...
Co-authored-by: One <imone@tuta.io>
2023-09-11 00:26:35 -07:00
Woosuk Kwon
320a622ec4
[BugFix] Implement RoPE for GPT-J ( #941 )
2023-09-06 11:54:33 +09:00
Woosuk Kwon
fbd80ad409
Clean up kernel unit tests ( #938 )
2023-09-05 16:57:38 -07:00
Zhuohan Li
d6fa1be3a8
[Quality] Add code formatter and linter ( #326 )
2023-07-03 11:31:55 -07:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM ( #150 )
2023-06-17 03:07:40 -07:00
Woosuk Kwon
a283ec2eec
Add contributing guideline and mypy config ( #122 )
2023-05-23 17:58:51 -07:00
Woosuk Kwon
825d8892b5
Use pytest format for unit tests ( #107 )
2023-05-17 17:11:23 -07:00