vllm/vllm at d6770d1f23b642289a2f1462f2851a7be9d3cc83 - vllm - Luminance Code Repo

20231088/vllm

History

Kyujin Cho 898285c9bf

fix: CUDA error when inferencing with Falcon-40B base model (#992 )

2023-09-10 01:39:02 -07:00

..

Make AsyncLLMEngine more robust & fix batched abort (#969 )

2023-09-07 13:43:45 -07:00

fix "tansformers_module" ModuleNotFoundError when load model with trust_remote_code=True (#871 )

2023-09-08 17:21:30 -07:00

Start background task in AsyncLLMEngine.generate (#988 )

2023-09-08 00:03:39 -07:00

Fix wrong dtype in PagedAttentionWithALiBi bias (#996 )

2023-09-09 14:58:35 -07:00

transformers_utils

Only emit warning about internal tokenizer if it isn't being used (#939 )

2023-09-05 00:50:55 +09:00

Align vLLM's beam search implementation with HF generate (#857 )

2023-09-04 17:29:42 -07:00

__init__.py

Bump up the version to v0.1.6 (#989 )

2023-09-08 00:07:46 -07:00

block.py

[Quality] Add code formatter and linter (#326 )

2023-07-03 11:31:55 -07:00

config.py

fix: CUDA error when inferencing with Falcon-40B base model (#992 )

2023-09-10 01:39:02 -07:00

logger.py

[Quality] Add code formatter and linter (#326 )

2023-07-03 11:31:55 -07:00

outputs.py

Align vLLM's beam search implementation with HF generate (#857 )

2023-09-04 17:29:42 -07:00

sampling_params.py

Align vLLM's beam search implementation with HF generate (#857 )

2023-09-04 17:29:42 -07:00

sequence.py

Align vLLM's beam search implementation with HF generate (#857 )

2023-09-04 17:29:42 -07:00

utils.py

[Quality] Add code formatter and linter (#326 )

2023-07-03 11:31:55 -07:00