Matthew E. Taylor's Publications

• Sorted by Date • Classified by Publication Type • Sorted by First Author Last Name • Classified by Research Category •

Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains

Matthew E. Taylor, Manish Jain, Prateek Tandon, and Milind Tambe. Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains. In Proceedings of the IJCAI 2009 Workshop on Distributed Constraint Reasoning, July 2009.
DCR-2009

Download

[PDF]698.3kB

Abstract

Substantial work has investigated balancing explorationand exploitation, but relatively little has addressed this tradeoff inthe context of coordinated multi-agent interactions. This paperintroduces a class of problems in which agents must maximize theiron-line reward, a decomposable function dependent on pairs of agent'sdecisions. Unlike previous work, agents must both learn the rewardfunction and exploit it on-line, critical properties for a class ofphysically-motivated systems, such as mobile wireless networks. Thispaper introduces algorithms motivated by the DistributedConstraint Optimization Problem framework and demonstrates when, andat what cost, increasing agents' coordination can improve the globalreward on such problems.

BibTeX Entry

@inproceedings(DCR09-Taylor,
  author="Matthew E.\ Taylor and Manish Jain and Prateek Tandon and Milind Tambe",
  title="Using {DCOP}s to Balance Exploration and Exploitation in Time-Critical Domains",
  Booktitle="Proceedings of the {IJCAI} 2009 Workshop on Distributed Constraint Reasoning",
  month="July",
  year= "2009",
  wwwnote={<a 
  href="http://www-scf.usc.edu/~wyeoh/DCR09/">DCR-2009</a>},
  abstract={Substantial work has investigated balancing exploration
and exploitation, but relatively little has addressed this tradeoff in
the context of coordinated multi-agent interactions. This paper
introduces a class of problems in which agents must maximize their
on-line reward, a decomposable function dependent on pairs of agent's
decisions. Unlike previous work, agents must both learn the reward
function and exploit it on-line, critical properties for a class of
physically-motivated systems, such as mobile wireless networks. This
paper introduces algorithms motivated by the \emph{Distributed
Constraint Optimization Problem} framework and demonstrates when, and
at what cost, increasing agents' coordination can improve the global
reward on such problems.},
)

Generated by bib2html.pl (written by Patrick Riley ) on Thu Jul 24, 2014 16:09:11