Breno Faria
87d41c849d
[BUGFIX] [FRONTEND] Correct chat logprobs ( #5029 )
...
Co-authored-by: Breno Faria <breno.faria@intrafind.com>
2024-05-30 02:52:14 -07:00
Cyrus Leung
eecd864388
[Bugfix][CI/Build] Fix test and improve code for merge_async_iterators
( #5096 )
2024-05-29 16:02:25 -07:00
Cyrus Leung
5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines ( #4328 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
Cyrus Leung
350f9e107f
[CI/Build] Move test_utils.py
to tests/utils.py
( #4425 )
...
Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time)
Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.
2024-05-13 23:50:09 +09:00
Cyrus Leung
f12b20decc
[Frontend] Move async logic outside of constructor ( #4674 )
2024-05-08 22:48:33 -07:00
Sebastian Schoennenbeck
f8e7adda21
Fix/async chat serving ( #2727 )
2024-05-03 11:04:14 -07:00
Ruoyu Qin
dfea173148
[Bugfix] Abort requests when the connection to /v1/completions is interrupted ( #4363 )
2024-04-27 09:48:37 -07:00
Roy
7134303cbb
[Bugfix][Core] Fix get decoding config from ray ( #4335 )
2024-04-27 11:30:08 +00:00
Cyrus Leung
1e8f4252aa
[Bugfix][Frontend] Raise exception when file-like chat template fails to be opened ( #4292 )
2024-04-23 18:19:03 +00:00
SangBin Cho
4e7ee664e2
[Core] Fix engine-use-ray broken ( #4105 )
2024-04-16 05:24:53 +00:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
Antoni Baum
fb96c1e98c
Asynchronous tokenization ( #2879 )
2024-03-15 23:37:01 +00:00
Zhuohan Li
2f8844ba08
Re-enable the 80 char line width limit ( #3305 )
2024-03-10 19:49:14 -07:00
Roy
9e8744a545
[BugFix] Fix get tokenizer when using ray ( #3301 )
2024-03-10 19:17:16 -07:00
Antoni Baum
ff578cae54
Add health check, make async Engine more robust ( #3015 )
...
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-03-04 22:01:40 +00:00
Antoni Baum
017d9f1515
Add metrics to RequestOutput ( #2876 )
2024-02-20 21:55:57 -08:00
Antoni Baum
9b945daaf1
[Experimental] Add multi-LoRA support ( #1804 )
...
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
2024-01-23 15:26:37 -08:00
FlorianJoncour
14cc317ba4
OpenAI Server refactoring ( #2360 )
2024-01-16 21:33:14 -08:00
Simon Mo
6e01e8c1c8
[CI] Add Buildkite ( #2355 )
2024-01-14 12:37:58 -08:00
Zhuohan Li
fd4ea8ef5c
Use NCCL instead of ray for control-plane communication to remove serialization overhead ( #2221 )
2024-01-03 11:30:22 -08:00
Zhuohan Li
66b108d142
[BUGFIX] Fix API server test ( #2270 )
2023-12-26 10:37:06 -08:00
Adam Brusselback
66785cc05c
Support chat template and echo
for chat API ( #1756 )
2023-11-30 16:43:13 -08:00
Simon Mo
5ffc0d13a2
Migrate linter from pylint
to ruff
( #1665 )
2023-11-20 11:58:01 -08:00
Zhuohan Li
9d9072a069
Implement prompt logprobs & Batched topk for computing logprobs ( #1328 )
...
Co-authored-by: Yunmo Chen <16273544+wanmok@users.noreply.github.com>
2023-10-16 10:56:50 -07:00
Zhuohan Li
ba0bfd40e2
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic ( #1181 )
2023-10-02 15:36:09 -07:00
Antoni Baum
ff36139ffc
Remove AsyncLLMEngine busy loop, shield background task ( #1059 )
2023-09-17 00:29:08 -07:00
Antoni Baum
080438477f
Start background task in AsyncLLMEngine.generate
( #988 )
...
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-09-08 00:03:39 -07:00
Antoni Baum
c07ece5ca4
Make AsyncLLMEngine
more robust & fix batched abort ( #969 )
...
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
2023-09-07 13:43:45 -07:00