Alan Gray (NVIDIA) 撰文介绍在 llama.cpp 中实现 CUDA 图:https://developer.nvidia.com/blog/optimizing-llama-cpp-ai-inference-with-cuda-graphs/