When build ml-llama with cuda 13, and then run below command
python megakernels/scripts/generate.py mode=mk prompt="tell me a funny joke about cookies" ntok=100
All generated tokens except the first one are "!".
After asking Gemini AI, I chose a cloud VM with cuda 12.8 and the demo works.
When build ml-llama with cuda 13, and then run below command
python megakernels/scripts/generate.py mode=mk prompt="tell me a funny joke about cookies" ntok=100All generated tokens except the first one are "!".
After asking Gemini AI, I chose a cloud VM with cuda 12.8 and the demo works.