Samenvatting

Summary Overview of exam goals of the Multi-Agent Systems course (XM_0052)

Beoordeling

Verkocht

Pagina's

Geüpload op

12-04-2021

Geschreven in

2020/2021

A summary for the Multi-Agent Systems course given in the Artificial Intelligence master at the VU. Overview of exam goals of the Multi-Agent Systems course (XM_0052)

Instelling

Vak

Voorbeeld van de inhoud

Multi Agent Systems Course VU
Exam Goals
April 12, 2021

1 Game Theory
1.1 Pay-off matrix, BR, Pareto, dominated strategies and IESDS
• We have a Pay-off matrix to determine payoff of simultaneous moves of 2 players. This can be put as
normal-form game which is a tuple of (N,A,u) with N being set of n players, A being actions (each
elements in A called action profile) and utility function (payoff u for each agent depends on the joint
actions of all agents).
• The best response in a pay-off game is the action agent i can choose to get the highest increase in
utility when the action of agent j is known. For a continuous state space, we find the maximum utility
for agent i by taking the partial derivative with respect to action of agent i and setting the derivative
to zero:
ui (x, y) = 2(x + y + bxy) − x2
Where y is the known action of other agent
ui
= 2(1 + by) − 2x = 0
x
xopt = 1 + by

• Pareto optimality is a solution property. A strategy a is Pareto dominated by another strategy a0
if the utility for a0 is higher or equal than for a with at least 1 instance which is strictly higher. A
Pareto dominant strategy is the highest mutual pay-off (but not necessarily the highest social welfare);
deviating from this strategy could increase utility for player 1 but will then decrease for player 2.
• Strictly vs weak dominance: if a set of strategies I can chose are all dominant over another set of
strategies then the first set is strictly dominant. Weakly dominant is for some point equal and other
points dominant.
• An elimination strategy we can use is IESDS: we look at what not to do; if a row or column is strictly
dominant over other row, we delete other row. We can do this recursively. By the nature of first looking
at a dominated column, than row, than column etc, we could end up with an equilibrium which does
not have to imply Pareto optimality, since the Pareto optimal strategy could be deleted.

1.2 Nash equilibrium, Vickrey auction
• A Nash equilibrium is a solution concept based on conditions instead of an algorithm. Nash can be
strict(>) or weak (≥); Nash supposes players have no regrets even though they know what other player
will do; best response to all other agents. This may not be the response with highest welfare (think
of prisoner’s dilemma). Pure Nash can be strict or weak; mixed Nash is always weak. For mixed NE,
agent makes opponent indifferent (for matrix games only). There alwyas exist a NE; but this NE may
be a mixed Nash.
• Computation of Mixed NE: assign p and (1-p) to choices of player 1, assign q and (1-q) to player 2.
Calculate utility via the pay-off matrix. For example with Battle of the Sexes: player 1 wants to choose
p so that player 2 is indifferent (which is the definition of mixed Nash), see fig. 1 for the calculation.

1

, Figure 1: Calculating utility for choosing mixed Nash equilibrium. Utility is 2/3, which is lower than choosing
the same movie; so mixed Nash is not always optimal.

• However, a mixed Nash does not always exists; for example in the Prisoner’s dilemma, the probability
of p and q end up to be higher than 1; this is not possible. Since for every game, a NE must exists,
this implies that the Prisoner’s dilemma has a Pure NE.
• Vickrey auction: n players do a bid for an item; the highest bid will get the item and only has to
pay the second-highest price. This is interesting since no players will have regrets afterwards and the
winner will have positive utility.

1.3 Repeated games, ultimatum game, extensive form, backward induction,
SPNE
• Repeated games: One-state games, states do not change (rock-paper-scissors). Maximize total reward
of finite known number of repetitions. Infinite number of repetitions; maximize discounted total rewards
(future rewards less valuable then present rewards):
inf
X
R= δ t−1 · rt
t=1

Using mathematical derivation, we can rewrite expected R to:
1
E(R) = (r1 + δ · r2 + δ 2 · r3 ...)
1−δ
Xinf
E(R) = δ t−1 · rt
t=1

• Repeated Prisoner’s Dilemma has an interesting strategy: Grim Trigger strategy.
Start by cooperating (best social welfare) until someone defects; then always defect.
If we look at table 1, we can derive the utility for always cooperating and always defecting:

Coop Defect
C 3,3 4,1
D 1,4 2,2

Table 1: Form of Prisoner’s dilemma

2

Meld schending auteursrecht

Geschreven voor

Instelling: Vrije Universiteit Amsterdam (VU)
Studie: MSc Artificial Intelligence
Vak: Multi-Agent Systems (XM_0052)

Alle documenten voor dit vak (3)

Documentinformatie

Geüpload op: 12 april 2021
Aantal pagina's: 8
Geschreven in: 2020/2021
Type: SAMENVATTING

Onderwerpen

multi agent systems
reinforcement learning
game theory

$7.76

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

timdeboer

3.0

(1)

Maak kennis met de verkoper

timdeboer Vrije Universiteit Amsterdam

Bekijk profiel

Volgen

Verkocht

Lid sinds

5 jaar

Aantal volgers

Documenten

Laatst verkocht

1 jaar geleden

3.0

1 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper timdeboer. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $7.76. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 49710 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Summary Overview of exam goals of the Multi-Agent Systems course (XM_0052)

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?