# Matthew E. Taylor's Publications

•
Sorted by Date •
Classified by Publication Type •
Sorted by First Author Last Name •
Classified by Research Category •

**Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains**

Matthew E. Taylor, Manish Jain, Prateek Tandon, and Milind
Tambe. **Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains**. In *Proceedings of the IJCAI
2009 Workshop on Distributed Constraint Reasoning*, July 2009.

DCR-2009

### Download

[PDF]698.3kB

### Abstract

Substantial work has investigated balancing explorationand exploitation, but relatively little has addressed this tradeoff
inthe context of coordinated multi-agent interactions. This paperintroduces a class of problems in which agents must maximize
theiron-line reward, a decomposable function dependent on pairs of agent'sdecisions. Unlike previous work, agents must both
learn the rewardfunction and exploit it on-line, critical properties for a class ofphysically-motivated systems, such as mobile
wireless networks. Thispaper introduces algorithms motivated by the *DistributedConstraint Optimization Problem* framework
and demonstrates when, andat what cost, increasing agents' coordination can improve the globalreward on such problems.

### BibTeX Entry

@inproceedings(DCR09-Taylor,
author="Matthew E.\ Taylor and Manish Jain and Prateek Tandon and Milind Tambe",
title="Using {DCOP}s to Balance Exploration and Exploitation in Time-Critical Domains",
Booktitle="Proceedings of the {IJCAI} 2009 Workshop on Distributed Constraint Reasoning",
month="July",
year= "2009",
wwwnote={<a
href="http://www-scf.usc.edu/~wyeoh/DCR09/">DCR-2009</a>},
abstract={Substantial work has investigated balancing exploration
and exploitation, but relatively little has addressed this tradeoff in
the context of coordinated multi-agent interactions. This paper
introduces a class of problems in which agents must maximize their
on-line reward, a decomposable function dependent on pairs of agent's
decisions. Unlike previous work, agents must both learn the reward
function and exploit it on-line, critical properties for a class of
physically-motivated systems, such as mobile wireless networks. This
paper introduces algorithms motivated by the \emph{Distributed
Constraint Optimization Problem} framework and demonstrates when, and
at what cost, increasing agents' coordination can improve the global
reward on such problems.},
)

Generated by
bib2html.pl
(written by Patrick Riley
) on
Thu Jul 24, 2014 16:09:11