Graduation date: 2006
A large number of sequential decision-making problems in uncertain environments
can be modeled as Markov Decision Processes (MDPs). In such settings, an agent
can observe at each time step the state of the environment and then executes an
action, causing a stochastic transition to a new state of the environment and
receiving a reward accordingly. In a finite-horizon MDP, the goal of planning is to maximize the expected total payoff over the given horizon. MDPs can be solved using a number of different algorithms whose complexity is generally some low-order polynomial in the number of states and decision-making horizon.
Interactive computer games constitute a great platform of development
for AI research in learning and planning. Akin to the real-world
problems they simulate, they introduce an additional level of
complexity. As a matter of fact, in such settings the agent's sensors
provide only partial information about the state of the environment,
called an observation. These problems can be modeled as partially
observable MDPs (POMDPs). At any point in time, the sequence of
observations made by the agent so far determines a probability
distribution over states, called a belief state. It has been shown that
solving a POMDP can be reduced to solving the corresponding MDP on the
set of belief states. This planning problem, however, becomes rapidly
intractable in large state spaces with a substential number of observations.
In this thesis, we adapt the work of Kearns, Mansour and Ng on sparse
sampling algorithms to factored POMDP representations of multi-agent
partially observable domains. Applying this algorithm to two domains based
on popular video games, we show empirically how a randomly sampled
look-ahead tree covering only a small fraction of the full look-ahead
tree is sufficient to compute near-optimal policies in these settings. We
compare the performance of this approach to the classical methods and
conclude that sparse sampling dramatically reduces the running time of
the planning algorithm and scales well with the number of enemy agents.