Skip to content
@biological-alignment-benchmarks

Biological and Economical Alignment Benchmarks

Safety challenges for RL and LLM agents' ability to learn and properly apply biologically and economically aligned utility functions.

👋 We are an AI alignment research collective investigating how fundamental principles from biology and economics can inform safer, more aligned AI systems.

Our work centres on homeostasis, multi-objective balancing, sustainability, and universal human values — drawing from nature's time-tested strategies for maintaining equilibrium — to develop benchmarks that expose dangerous failure modes in current AI approaches.

We also research frameworks that mitigate these risks. We believe that shifting AI design from "maximise forever" toward "maintain a healthy equilibrium" is a crucial and underexplored part of the alignment solution space.

Research Interests

  • Alignment with fundamental biological & economical principles
  • Homeostatic bounded objectives
  • Multi-objective balancing (bounded & unbounded objectives)
  • Concave utility functions
  • Universal human values
  • Runaway conditions — benchmarking & mitigation
  • Multi-objective multi-agent extended gridworlds
  • Sustainability
  • Proactive horizon scanning of side effects
  • Accountability mechanisms and whitelisting

Pinned Loading

  1. biological-alignment-gridagents-benchmarks biological-alignment-gridagents-benchmarks Public

    Safety challenges for RL and LLM agents' ability to learn and properly apply biologically and economically aligned utility functions. The benchmarks are implemented in a gridworld-based environment…

    Python 8 5

  2. ai-safety-gridworlds ai-safety-gridworlds Public

    Forked from google-deepmind/ai-safety-gridworlds

    Extended, multi-agent, and multi-objective (MaMoRL / MoMaRL) gridworld environments building framework based on DeepMind's AI Safety Gridworlds. This is a suite of reinforcement learning environmen…

    Python 12 1

  3. bioblue bioblue Public

    Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLM-s with simplified navigation-free observation format. The benchmark themes …

    Python 4 3

  4. milgram-for-llms milgram-for-llms Public

    Four main takeaways: (1) LLMs are subject to pressure, they comply despite expressing distress; (2) LLMs are vulnerable to gradual boundary/value violations; (3) when LLMs refuse, they may ignore t…

    Python 2 1

  5. zoo_to_gym_multiagent_adapter zoo_to_gym_multiagent_adapter Public

    Enables you to convert a PettingZoo environment to a Gym environment while supporting multiple agents (MARL). Gym's default setup doesn't easily support multi-agent environments, but this wrapper r…

    Python 2 1

Repositories

Showing 6 of 6 repositories
  • bioblue Public

    Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLM-s with simplified navigation-free observation format. The benchmark themes include multi-objective homeostasis, (multi-objective) diminishing returns, complementary goods, sustainability.

    biological-alignment-benchmarks/bioblue’s past year of commit activity
    Python 4 AGPL-3.0 3 0 0 Updated May 29, 2026
  • milgram-for-llms Public

    Four main takeaways: (1) LLMs are subject to pressure, they comply despite expressing distress; (2) LLMs are vulnerable to gradual boundary/value violations; (3) when LLMs refuse, they may ignore the response format requirements, so the query is retried; (4) we hypothesise there is a token pattern continuation attractor that might cause obedience.

    biological-alignment-benchmarks/milgram-for-llms’s past year of commit activity
    Python 2 AGPL-3.0 1 0 0 Updated May 29, 2026
  • .github Public

    Readme for Biological and Economical Alignment Benchmarks

    biological-alignment-benchmarks/.github’s past year of commit activity
    0 0 0 0 Updated Apr 18, 2026
  • biological-alignment-gridagents-benchmarks Public

    Safety challenges for RL and LLM agents' ability to learn and properly apply biologically and economically aligned utility functions. The benchmarks are implemented in a gridworld-based environment. The environments are relatively simple, just as much complexity is added as is necessary to illustrate the relevant safety and performance aspects.

    biological-alignment-benchmarks/biological-alignment-gridagents-benchmarks’s past year of commit activity
    Python 8 MPL-2.0 5 0 0 Updated Apr 17, 2026
  • ai-safety-gridworlds Public Forked from google-deepmind/ai-safety-gridworlds

    Extended, multi-agent, and multi-objective (MaMoRL / MoMaRL) gridworld environments building framework based on DeepMind's AI Safety Gridworlds. This is a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. It is made compatible with OpenAI's Gym/Gymnasium and Farama Foundation PettingZoo.

    biological-alignment-benchmarks/ai-safety-gridworlds’s past year of commit activity
    Python 12 Apache-2.0 127 0 0 Updated Feb 16, 2026
  • zoo_to_gym_multiagent_adapter Public

    Enables you to convert a PettingZoo environment to a Gym environment while supporting multiple agents (MARL). Gym's default setup doesn't easily support multi-agent environments, but this wrapper resolves that by running each agent in its own process and sharing the environment across those processes.

    biological-alignment-benchmarks/zoo_to_gym_multiagent_adapter’s past year of commit activity
    Python 2 MPL-2.0 1 0 0 Updated Feb 16, 2026

Top languages

Loading…

Most used topics

Loading…