Jee Jee Li
295a061fb3
[Kernel] add kernel for FATReLU ( #9610 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-10-24 16:18:27 +08:00
Cyrus Leung
6ffa3f314c
[CI/Build] Avoid CUDA initialization ( #8534 )
2024-09-18 10:38:11 +00:00
bnellnm
73202dbe77
[Kernel][Misc] register ops to prevent graph breaks ( #6917 )
...
Co-authored-by: Sage Moore <sage@neuralmagic.com>
2024-09-11 12:52:19 -07:00
jon-chuang
50b8d08dbd
[Misc/Testing] Use torch.testing.assert_close
( #7324 )
2024-08-16 04:24:04 +00:00
Woosuk Kwon
41ca62cf03
[Misc] Add CustomOp interface for device portability ( #5255 )
2024-06-05 09:18:19 -07:00
Cyrus Leung
350f9e107f
[CI/Build] Move test_utils.py
to tests/utils.py
( #4425 )
...
Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time)
Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.
2024-05-13 23:50:09 +09:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
Woosuk Kwon
602358f8a8
Add kernel for GeGLU with approximate GELU ( #3337 )
2024-03-12 22:06:17 -07:00
Woosuk Kwon
fd5dcc5c81
Optimize GeGLU layer in Gemma ( #2975 )
2024-02-21 20:17:52 -08:00
Hongxia Yang
56f738ae9b
[ROCm] Fix some kernels failed unit tests ( #2498 )
2024-02-05 14:25:36 -08:00
Kunshang Ji
96b6f475dd
Remove hardcoded device="cuda"
to support more devices ( #2503 )
...
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2024-02-01 15:46:39 -08:00
Jee Li
77af974b40
[FIX] Support non-zero CUDA devices in custom kernels ( #1959 )
2024-01-02 19:09:59 -08:00
Woosuk Kwon
9b294976a2
Add PyTorch-native implementation of custom layers ( #1898 )
2023-12-02 21:18:40 -08:00
Yanming W
e0c6f556e8
[Build] Avoid building too many extensions ( #1624 )
2023-11-23 16:31:19 -08:00
Zhuohan Li
ba0bfd40e2
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic ( #1181 )
2023-10-02 15:36:09 -07:00
Woosuk Kwon
fbd80ad409
Clean up kernel unit tests ( #938 )
2023-09-05 16:57:38 -07:00
Woosuk Kwon
d64bf1646c
Implement approximate GELU kernels ( #828 )
2023-08-23 07:43:21 +09:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM ( #150 )
2023-06-17 03:07:40 -07:00
Woosuk Kwon
825d8892b5
Use pytest format for unit tests ( #107 )
2023-05-17 17:11:23 -07:00