A high-throughput and memory-efficient inference and serving engine for LLMs
GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm