Hi, I noticed that some benchmark runs use --timeout-multiplier. Is there a recommended --timeout-multiplier setting per model, or a guideline for choosing the right value? For example, for slower/larger models, what multiplier do you suggest? Does the official leaderboard use a fixed multiplier across all models, or does it vary?
Hi, I noticed that some benchmark runs use --timeout-multiplier. Is there a recommended --timeout-multiplier setting per model, or a guideline for choosing the right value? For example, for slower/larger models, what multiplier do you suggest? Does the official leaderboard use a fixed multiplier across all models, or does it vary?