Skip to content

Running latency demo with cuda 13 generates invalid tokens, change to 12.8 fix the issue #9

Description

@AnDongLi

When build ml-llama with cuda 13, and then run below command

python megakernels/scripts/generate.py mode=mk prompt="tell me a funny joke about cookies" ntok=100

All generated tokens except the first one are "!".

After asking Gemini AI, I chose a cloud VM with cuda 12.8 and the demo works.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions