Add Partially Observable Monte Carlo Tree Search (MCTS) Planner Module with Tests: Python by aabs7 · Pull Request #6 · RAIL-group/RAIL-lsp-dev

aabs7 · 2025-01-07T23:00:44Z

Summary

This PR introduces a Python module implementing the Partially Observable Monte Carlo Tree Search (PO-MCTS) planner. The code computes the best action and cost by taking a Markov Decision Process by handling probabilistic transitions, action costs, and goal states.

Usage

from pouct_planner import core

stochastic_mdp = {
    'S': {'A': [('S1', 0.8, 5), ('S2', 0.2, 50)]},
    'S1': {'C': [('S3', 1.0, 3)]},
    'S2': {}, 
    'S3': {}
}
state = MDP('S', stochastic_mdp) # This is a testing MDP state class
best_action, cost = core.po_mcts(state, n_iterations=10000, C=1.0, rollout_fn=None)
assert best_action == 'A'
assert pytest.approx(cost, abs=1.0) == (0.8 * (5 + 3) + 0.2 * 50)

Requirements for State

The state class has some functional requirements.

class State():
    def __init__(self, ...):
        self.is_goal_state = False  # Update when the state is a goal state
    
    def get_actions(self):
        return [action1, action2, ...]  # List of actions from the state
    
    def transition(self, action):
        return {State(): (prob, cost), State(): (prob, cost), ...}  # Probabilities should be normalized
    
    def __eq__(self, other):
        return self.hash == other.hash  # Compare states using hash

Tests

Tests for individual functions used in the planner. For eg: the average of rollout costs gives expected value, rollout costs fall between the minimum and the maximum cost for that state, backpropagation correctly updates the node, the functionality of traversal, and so on.
Tests for best action and costs in different MDP environments like linear deterministic MDP, large state and action spaces, stochastic MDP, and so on.

Additional Notes

The rollout_fn can be customizable. If rollout_fn is not provided, a random rollout is used from the current state.
The states are sampled according to the distributions received from the transition function using the get_chance_node() function.

aabs7 added 6 commits December 31, 2024 19:11

planner done; tests pass; testing on progress

9fc5322

Add tests for environments, add function tests

84b4b46

remove print statements; test pass, sampling cost fails test sometimes

4067eb3

add total_n to save computation; add test

996ecc7

add goal state to state class

4c476e3

add lower bound rollout fn to failing test

fa90042

aabs7 requested a review from gjstein January 7, 2025 23:00

aabs7 self-assigned this Jan 7, 2025

add readme file

44d8542

aabs7 added the enhancement New feature or request label Jan 7, 2025

aabs7 added 2 commits January 13, 2025 13:56

added child before rollout from chance node outcome

0229015

po_mcts function returns path, add test for path, all tests pass

2edf219

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Partially Observable Monte Carlo Tree Search (MCTS) Planner Module with Tests: Python#6

Add Partially Observable Monte Carlo Tree Search (MCTS) Planner Module with Tests: Python#6
aabs7 wants to merge 9 commits into
mainfrom
abhish/pouct-planner

aabs7 commented Jan 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aabs7 commented Jan 7, 2025

Summary

Usage

Requirements for State

Tests

Additional Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant