-
Notifications
You must be signed in to change notification settings - Fork 0
Issues
is:issue state:open
is:issue state:open
Issue creation is restricted in this repository
Search results
[bench] Benchmark the real chimera code agent on GLM-5.2[1m] (agentic, not single-shot)
agent-benchAgent benchmarkAgent benchmarkbenchmarkBenchmark evaluationBenchmark evaluationpriority:highHigh priorityHigh priorityStatus: Open.#160 In 0bserver07/chimera;[audit] Adversarially verify documented capability claims against code + expand real-LLM coverage
anti-hallucinationReduce hallucinationsReduce hallucinationsauditFrom AUDIT.md findingsFrom AUDIT.md findingspriority:mediumMedium priorityMedium prioritytransparencyBenchmark transparencyBenchmark transparencyStatus: Open.#159 In 0bserver07/chimera;[bench] Light up the dark adapters — first live runs for 7 built-but-never-run benchmarks
benchmarkBenchmark evaluationBenchmark evaluationpriority:mediumMedium priorityMedium prioritytransparencyBenchmark transparencyBenchmark transparencyStatus: Open.#158 In 0bserver07/chimera;[bench] DeepSWE flagship comparative matrix — N architectures × M models, real Docker verifier (117 Harbor tasks)
agent-benchAgent benchmarkAgent benchmarkbenchmarkBenchmark evaluationBenchmark evaluationpriority:highHigh priorityHigh prioritytransparencyBenchmark transparencyBenchmark transparencyStatus: Open.#157 In 0bserver07/chimera;[teams] Live-verify OpenCode + internal Chimera teammates end-to-end
integrationWiring modules together end-to-endWiring modules together end-to-endpriority:mediumMedium priorityMedium priorityStatus: Open.#151 In 0bserver07/chimera;[teams] Unified per-teammate permission propagation across runtimes
enhancementNew feature or requestNew feature or requestpriority:mediumMedium priorityMedium priorityStatus: Open.#150 In 0bserver07/chimera;[teams] Real-time message push to running teammates
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is neededpriority:lowLow priorityLow priorityStatus: Open.#149 In 0bserver07/chimera;[meta] Roadmap discussion — what's the next year of chimera-run?
enhancementNew feature or requestNew feature or requestquestionFurther information is requestedFurther information is requestedStatus: Open.#146 In 0bserver07/chimera;[env] Cloud sandbox providers — Modal, Daytona, Northflank/e2b
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is neededpriority:mediumMedium priorityMedium priorityStatus: Open.#144 In 0bserver07/chimera;[bench] Run ProgramBench live — first 10 instances, GLM-5.1 + qwen3-coder
benchmarkBenchmark evaluationBenchmark evaluationpriority:highHigh priorityHigh priorityStatus: Open.#141 In 0bserver07/chimera;Terminal-Bench: path from 30% to 56% (adapter improvements)
agent-benchAgent benchmarkAgent benchmarkbenchmarkBenchmark evaluationBenchmark evaluationpriority:highHigh priorityHigh priorityStatus: Open.#139 In 0bserver07/chimera;- Status: Open.#127 In 0bserver07/chimera;