减少延迟，提升性能，并负责任地支持AI工作负载。☁️💻⚡M-CSDN blink-领先的开发者技术社区

热门

最新

红包

立Flag

投票

同城

我的

发布

1 年前

trueweixin_41110484

减少延迟，提升性能，并负责任地支持AI工作负载。☁️💻⚡

Meta的Llama 3.2模型，现在正在使用，为最前沿的应用程序提供动力——通过复杂的图像理解和视觉推理点燃创新。

👉 https://go.aws/47IEqY8

CSDN App 扫码分享

分享

2

29

复制链接
举报

下一条：

推文原文：Llama 3.2 1B 在 M2 Ultra 上以 4 位生成，每秒生成约 350 个令牌 (!)。这很有趣。命令：mlx_lm.generate --model mlx-community/Llama-3.2-1B-Instruct-4bit --prompt "Write a story about Einstein" --temp 0.0 --max-tokens 512未加速：