40 Commits

Author SHA1 Message Date
Kyle Mistele
e02ce498be
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
2024-09-04 13:18:13 -07:00
Roger Wang
5b86b19954
[Misc] Optional installation of audio related packages (#8063) 2024-09-01 14:46:57 -07:00
Kaunil Dhruv
058344f89a
[Frontend]-config-cli-args (#7737)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>
2024-08-30 08:21:02 -07:00
Patrick von Platen
6fc4e6e07a
[Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739) 2024-08-27 12:40:02 +00:00
Cyrus Leung
baaedfdb2d
[mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
2024-08-20 23:28:21 -07:00
SangBin Cho
ff7ec82c4d
[Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00
Xander Johnson
7c0b7ea214
[Bugfix] add >= 1.0 constraint for openai dependency (#7612) 2024-08-16 20:56:01 -07:00
PHILO-HE
f4da5f7b6d
[Misc] Update dockerfile for CPU to cover protobuf installation (#7182) 2024-08-15 10:03:01 -07:00
Kyle Sayers
f55a9aea45
[Misc] Revert compressed-tensors code reuse (#7521) 2024-08-14 15:07:37 -07:00
youkaichao
16422ea76f
[misc][plugin] add plugin system implementation (#7426) 2024-08-13 16:24:17 -07:00
Kyle Sayers
373538f973
[Misc] compressed-tensors code reuse (#7277) 2024-08-13 19:05:15 -04:00
Peter Salas
00c3d68e45
[Frontend][Core] Add plumbing to support audio language models (#7446) 2024-08-13 17:39:33 +00:00
Daniele
774cd1d3bf
[CI/Build] bump minimum cmake version (#6999) 2024-08-12 16:29:20 -07:00
Noam Gat
4fb7b52a2c
Updating LM Format Enforcer version to v0.10.6 (#7189) 2024-08-11 08:11:50 -04:00
Cyrus Leung
7eb4a51c5f
[Core] Support serving encoder/decoder models (#7258) 2024-08-09 10:39:41 +08:00
Isotr0py
360bd67cf0
[Core] Support loading GGUF model (#5191)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-05 17:54:23 -06:00
Michael Goin
421e218b37
[Bugfix] Bump transformers to 4.43.2 (#6752) 2024-07-24 13:22:16 -07:00
Roger Wang
1bedf210e3
Bump transformers version for Llama 3.1 hotfix and patch Chameleon (#6690) 2024-07-23 13:47:48 -07:00
Noam Gat
9da4aad44b
Updating LM Format Enforcer version to v10.3 (#6411) 2024-07-13 10:09:12 +00:00
Roger Wang
f7160d946a
[Misc][Bugfix] Update transformers for tokenizer issue (#6364) 2024-07-12 08:40:07 +00:00
Lim Xiang Yang
fc17110bbe
[BugFix]: set outlines pkg version (#6262) 2024-07-11 04:37:11 +00:00
youkaichao
da78caecfa
[core][distributed] zmq fallback for broadcasting large objects (#6183)
[core][distributed] add zmq fallback for broadcasting large objects (#6183)
2024-07-09 18:49:11 -07:00
danieljannai21
2c37540aa6
[Frontend] Add template related params to request (#5709) 2024-07-01 23:01:57 -07:00
Woosuk Kwon
79c92c7c8a
[Model] Add Gemma 2 (#5908) 2024-06-27 13:33:56 -07:00
Cyrus Leung
e9c2732b97
[CI/Build] Add tqdm to dependencies (#5680) 2024-06-19 08:37:33 -06:00
youkaichao
f07d513320
[build][misc] limit numpy version (#5582) 2024-06-16 16:07:01 -07:00
Breno Faria
7b0a0dfb22
[Frontend][Core] Update Outlines Integration from FSM to Guide (#4109)
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Breno Faria <breno.faria@intrafind.com>
2024-06-05 16:49:12 -07:00
Cyrus Leung
7a64d24aad
[Core] Support image processor (#4197) 2024-06-02 22:56:41 -07:00
Michael Goin
86b45ae065
[Bugfix] Relax tiktoken to >= 0.6.0 (#4890) 2024-05-17 12:58:52 -06:00
Alex Wu
52f8107cf2
[Frontend] Support OpenAI batch file format (#4794)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-05-15 19:13:36 -04:00
Noam Gat
bd99d22629
Update lm-format-enforcer to 0.10.1 (#4631) 2024-05-06 23:51:59 +00:00
Ronen Schaffer
bf480c5302
Add more Prometheus metrics (#2764)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
2024-04-28 15:59:33 -07:00
Cyrus Leung
8947bc3c15
[Frontend][Bugfix] Disallow extra fields in OpenAI API (#4355) 2024-04-27 05:08:24 +00:00
Noam Gat
cc74b2b232
Updating lm-format-enforcer version and adding links to decoding libraries in docs (#4222) 2024-04-20 08:33:16 +00:00
Nick Hill
87fa80c91f
[Misc] Bump transformers to latest version (#4176) 2024-04-18 14:36:39 -07:00
Noam Gat
05434764cd
LM Format Enforcer Guided Decoding Support (#3868)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-04-16 05:54:57 +00:00
Zhuohan Li
e11e200736
[Bugfix] Fix filelock version requirement (#4075) 2024-04-14 21:50:08 -07:00
SangBin Cho
09473ee41c
[mypy] Add mypy type annotation part 1 (#4006) 2024-04-12 14:35:50 -07:00
SangBin Cho
18de883489
[Chunked Prefill][4/n] Chunked prefill scheduler. (#3853) 2024-04-05 10:17:58 -07:00
Woosuk Kwon
cfaf49a167
[Misc] Define common requirements (#3841) 2024-04-05 00:39:17 -07:00