Русские видео

Сейчас в тренде

Иностранные видео


Скачать с ютуб Dynamic Scheduling for Large Language Model Serving | Ray Summit 2024 в хорошем качестве

Dynamic Scheduling for Large Language Model Serving | Ray Summit 2024 6 месяцев назад


Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса savevideohd.ru



Dynamic Scheduling for Large Language Model Serving | Ray Summit 2024

Hanyu Zhao from Alibaba Group presents Llumnix, a dynamic request scheduling system for large language models, at Ray Summit 2024. Built on vLLM and Ray, Llumnix addresses key challenges in LLM serving through innovative runtime rescheduling and KV cache migration across instances. Zhao discusses how Llumnix reduces prefill latencies through cross-instance defragmentation and minimizes tail decoding latencies by balancing loads and reducing preemptions. The talk covers the research journey behind Llumnix, from its origins to its publication at OSDI '24, and its subsequent deployment and evolution at Alibaba. The presentation provides insights into the current state of Llumnix and outlines future development plans. Zhao also highlights the open-source nature of the project, available on GitHub, encouraging community engagement and collaboration. This session offers valuable information for those interested in optimizing LLM serving, particularly in large-scale, high-performance environments. It demonstrates practical applications of Ray and vLLM in addressing complex scheduling challenges in AI infrastructure. -- Interested in more? Watch the full Day 1 Keynote:    • Ray Summit 2024 Keynote Day 1 | Where...   Watch the full Day 2 Keynote    • Ray Summit 2024 Keynote Day 2 | Where...   -- 🔗 Connect with us: Subscribe to our YouTube channel:    / @anyscale   Twitter: https://x.com/anyscalecompute LinkedIn:   / joinanyscale   Website: https://www.anyscale.com

Comments