Blog Archive

Thursday, April 19, 2012

Possible answers to examination questions

In this post, I try to answer examination questions in preparation for my exam. It's a 4th year course in Nanyang Technological University, "CSC416 Intelligent Agents". The exam papers are from other university, but they are very similar to ours.
 Exam papers can be found here!

MAY 2009, Liverpool university's COMP310 examination paper's answers:

Question 1

a)
  Intentional stance is a way of understanding and predicting behavior of a complex system by attributing to it mental states such as belief, desire, intent, hope, fear etc. There are other stances which are less abstract, such as physical and design stance. Physical stance explains a system in terms of physical laws and physical attributes (mass, velocity etc.). Design stance does it in term of purpose, function and design approach.
Intentional stance is most appropriately applied and useful when explaining complex systems, which have have very complex structure, if we try to explain them using physical or design stance, it might become difficult to grasp the whole picture. We use intentional stance and attribute mental states to system, because it's convenient to explain, predict behavior of, and repair or improve the system, not because we actually believe that the complex system has these mental states.
It's legitimate to use intentional stance, when it expresses the same information about the computer system that it expresses about a person.
It's useful to use intentional stance, if it helps us to understand the structure of computer system, it's past and future behavior or how to repair or improve it.

b)

- Agent0 programs are constructed in terms of intentional notions like belief, commitment and intention.
- Agent0 loop:
  1. Read all messages, updating beliefs and commitments if necessary
  2. Execute commitments if capable
  3. Goto 1.
- Agent0 communicates by passing messages of 3 types: request, unrequest and inform. Request and unrequest messages typically result in agent's commitments being modified, where as inform messages change agent's beliefs.

Question 2

a) 

"Practical Reasoning = Deliberation + means-ends reasoning"
Deliberation = decides on what to do next (which intention to achieve next)
Means-ends reasoning = decides how to achieve it (the intention received from deliberation)
With respect to code, functions options() and filter() perform deliberation, function plan() performs means-ends reasoning.

b) 

Since agent is committed to plans just as long as it is rational to be committed to them, it has open-minded commitment strategy. It achieves this through lines 9, 15, and 19, where it checks the rationality of commitment to current plan.
Agent will reconsider it's plans:
  1. at line 8, if plan finished (line 9).
  2. at line 20, if the plan is not sound anymore:  because there's a possibility of better plan (line 19, here the plan is reconsidered based on current intention I  (which could have been changed in line 15)  and precept B).

c)

Agent is committed to intentions just as long as it is rational to be committed to them. It achieves this through lines 9, 15, where it checks the rationality of commitment to current intention. 
Agent will reconsider it's intentions:
  1. at line 6,7, if the intention already achieved or intention no longer possible  (line 9)
  2. at line 16,17, if there's a better intention to pursue (line 15)

Question 3

a)

i) (∅) = 0
ii) ({a}) =  0
iii) ({b}) =  2 - 2 = 0
v) ({a, b}) = 5 + 2 - 2 = 5
vi) ({a, b, c}) = 5 + 4 + 2 = 11
You set every variable that is subset of {S} to true and evaluate rules, if rules are true you add the contribution, S here could be S = {a}, or S = {a, b} or S = {a,b,c}.
For example: for vi) ({a,b,c}) a= true, b = true; c = true;
Now, evaluate each rule:
  • a ^ b is true, so add 5  
  • b is true, so add 2
  • c is true, so add 4
  • b ^ ¬c is false, so do not add −2
Total   ({a, b, c}) = 5 + 4 + 2 = 11

Another example: for  v) ({a, b})  a= true, b = true; c = false;
Now, evaluate each rule:
  • a ^ b is true, so add 5  
  • b is true, so add 2
  • c is false, so do not add 4
  • b ^ ¬c is true, so add −2
Total   ({a, b}) = 5 + 2 - 2 = 5

b)

i) ({a}) = 5,   ({b}) = 5  and ({a, b}) = 5 + 5 + 10
Explanation: Value of coalition here is just sum of all weights between nodes that participate in coalition.

ii) Let's us consider coalition   ({a, b}), v({a, b}) =  20. The core of this coalition is a distribution of value, where each sub-coalition (in this case {a} and {b}) get no less value that they get if the stand by themselves only. So, possible distributions are: <a, b> = <5, 15>, <6, 14>,<7, 13>, ... <14, 6>,<15, 5>.
Example of a distribution that is not core of this game, but still feasible is: <a, b> = <4, 16>, <0, 20>.. anything where a or b is less than 5.
Note: feasible distribution is such a distribution where the sum of outcomes does not exceed the coalition value. For example, for coalition   ({a, b}), with coalition value v({a, b}) =  20, outcome <5, 16> is not feasible.

iii) 3 fairness axioms are: symmetry, dummy player, additivity.
  • Symmetry: if two nodes contribute same value to coalition, their Shapley value should be the same.
  • Dummy player: If node contributes same value to the coalition as he can achieve alone, then his Shapley value should be equal to the value that he can achieve alone.
  • Additivity: If we have two coalitions A and B, that agent i participates. His Shapley value in coalition {A,B} should be sum of his Shapley values in coalitions {A} and {B}.
Recall from earlier question i) that: ({∅}) = 0, ({a}) = 5,   ({b}) = 5  and ({a, b}) = 20, so Shapley values of a and b are:
Φ ({a,b}, a) = 1/2 (v{a}-v{∅} + v{a,b} - v{b}) = 1/2(5 - 0 + 20 - 5) = 10
Φ ({a,b}, b) = 1/2 (v{b}-v{∅} + v{a,b} - v{a}) = 1/2(5 - 0 + 20 - 5) = 10

Question 4    skipping this question because we never covered this :)

Question 5

a)

i) In this game of matching pennies there's no pure strategy Nash equilibrium, but there's a Nash equilibrium in mixed strategies. That is to play strategy of heads or tails with 50% probability. This makes opponent strategy choices cost the same and he will play head or tails strategy with 50% probability too. Nobody will want to deviate from this mixed strategy, so there's a Nash equilibrium. 

ii) The outcome where agents use mix strategies equally is Pareto optimal. Because we cannot make any agent better off, without making other agent worse off.

b)

In prisoner's dilemma, the agent's strategy is to defect, regardless of the other player's decision. Because of that the social welfare is not maximized. To maximize social welfare, we introduce program equilibria, which will decide on agent's behalf, the agent's action. If strategy of the agent is the same as the strategy of the other agent, then cooperate and get higher payoff. If the other agent changed his strategy, then defect and get minimum payoff, same as before the program equilibria, to be on the safe side. So there's no risk, as long as we believe that program equilibria can be trusted.


Ending of May 2009


MAY 2008, Liverpool university's COMP310 examination paper's answers:

Question 1

a)


The two fundamental problems to be solved in deliberative or symbolic agents are:

  • Transduction problem: how to translate real world information that's being acquired into accurate symbolic description, in time for this information to be useful. (examples: speech, vision, etc)
  • Representation/Reasoning problem: how to represent information about real world entities and reason with this acquired symbolic representation in time for the result to be useful. (knowledge representation, reasoning, planning, etc)

b)

This loop will first try to find an action that can be proved given the current states of the world variables and and set of rules that are predetermined(the theory about the world). If it cant find anything, it will then look for some rule that cannot be explicitly disproved given the current states of the world variables and and set of rules. If it can't find anything it will return null, otherwise return the action found.
ρ → means the set of rules about the world, the theory.
Δ → means current state of the world, a database of all variables
Δ ⊢ ρ Do(α) → means action α, can be proved from Δ database, using  σ set of rules(theory).
Limitation of this approach lies in complexity of symbol manipulation algorithms, most of them are computationally intractable.

c)

Calculative rationality means the solution that the agent returns must be optimal when the agent began deliberating. This either means, the agent's algorithm is infinitely fast or the environment is static while the agent is performing his deliberation. In the outline architecture given in part b) the agent either assumes that the environment is static or his computational power is infinite and he can return resulted action instantaneously.

d)

Bounded optimality means that agent will return optimal solution given constrained computational and other resources.

Question 2

a)

  • B0, B  →  B0 initial beliefs of the agent, B is current beliefs of the agent
  • D  → D is desires of the agent. It's a set of options that are most desirable and that agent can pursue.
  • I0, I  →   I0  initial  intentions of the agent,  I is current  intentions of the agent
  • brf ( )  → method that updates beliefs of the agent based on current percept ρ
  • options( )  → method that generates a set of possible and most wanted alternatives, desires.
  • filter( )  → this function generates an intention out of the set of desires D, previous intention and current beliefs
  • plan( ), π , hd( ), tail( )  → plan() creates a sequence of actions α,  π variable holds this plan, hd() returns first(head) action of the plan  π, tail() advances the pointer of first action to the next action in sequence.
  • succeeded( )  → this function checks whether the intention I has succeeded based on current beliefs B.
  • impossible( )  →  this function checks whether the intention I is impossible based on current beliefs B.
  • sound( )  →  this function checks whether the plan  π is still sound (feasible), based on current intention I and beliefs B.

b)

“A rational agent is committed to intentions as long as it believes these are achievable and have not been achieved.” The control loop above satisfies this criterion, by checking if the intention has succeeded or if it's still possible using functions succeeded(I, B) and impossible(I, B). It will also reconsider it's intention after every action of it's plan using reconsider(I, B) function, which is basically the same as succeeded(I, B) and impossible(I, B).

Question 3

a)

Ag = {1,2}
ν({1}) = 5,  ν({2}) = 5,  ν({1, 2}) = 20.
A coalitional game is model used to help us see how different agents, that will form various group formations, can benefit from cooperation.
The core of the coalitional game is set of feasible distributions where no agent gets less than his contribution. This distribution is not necessarily fair. (Note: a feasible distribution is a distribution that adds up to less than or equal to total coalition value)

b)

Shapley value axioms:
Let N be finite set of players, ν is a function that determines value of a coalition S ⊆ N

  • Symmetry: for all ν, if i and j agents are interchangeable, then their payoff ψi(N, ν) = ψj(N, ν). Agents i and j are interchangeable if for any coalition S without them, the following holds  ν(S ∪ {i}) =  ν(S ∪ {j}).
  • Dummy player: for all  ν, if i is a dummy player then his payoff ψi(N, ν) =  ν({i}). i is a dummy player if, for all S that do not contain i, the following holds  ν(S ∪ {i})  -   ν{S}  =  ν({i}) .
  • Additivity: for any two ν1, ν2 functions and agent i, the following holds  ψi(N,  ν+ ν2 ) =  ψi(N,  ν1 ) + ψi(N, ν2 ). The game  (N,  ν+ ν2 ) is defined as   (ν+ ν2) (S) =  ν1 (S) +  ν2 (S). This game is actually a combination of two different games, and the value of coalition S in this game is sum of values that we got when we played the games separately.
  • Shapley value: given a coalitional game  (N, ν), the Shapley value is distribution of values among agents (payoff vector ψ(N, ν)), such that it divides the full value of coalition and satisfies above axioms of Symmetry, Dummy player and Additivity. The Shapley value of a player i is defined as:
where C is subset of A (set of agents) without i, n is number of agents playing. This equivalent to averaging marginal contribution of agent i, over all permutations of coalitions C of the set A.
For example, in the above section a) Shapley values of agent a and b are:
Φa(ν) = 1/2 ( ν {a}- ν {∅} +  ν {a,b} -  ν {b}) = 1/2(5 - 0 + 20 - 5) = 10
Φb(ν) = 1/2 ( ν {b}- ν {∅} +  ν {a,b} -  ν {a}) = 1/2(5 - 0 + 20 - 5) = 10
A fair distribution of coalitional value among coalition members.

c)

The following are problems of coalition game from a computational perspective:
  • How to represent the coalitional game and all it possible combinations. As can be seen from Shapley value equation, representation of the coalitional game from a naive perspective is exponential in size of n. 
  • How to find best coalition out of all possible coalitions. Finding coalition formation, that maximizes total utility, is an NP-complete problem. Because we need to iterate through all possible combinations of players n, of the coalition game. 

Question 4    skipping this question because it cannot be viewed

Question 5

a)

Nash equilibrium is such a strategy pair S1, S2 that given player 2 plays strategy S2, player 1 has no better strategy than to play S1, vice versa for player 2 (given that player 1 plays S1, player 2 has no better strategy than to play S2). So no player has incentive to deviate from their strategies S1 and S2.
The concept of Nash equilibrium is important because it helps us understand how self-interested players will behave given they know outcomes of their strategies, while knowing possible strategies that the other player can take.
The the example above, defect-defect strategy pair is Nash equilibrium, If player i one knows that player j's strategy is defect, player i's best strategy is to defect too. If the player i knows that player j's strategy is to cooperate, player i's best strategy is still to defect. In both cases player i's strategy is to defect, since player j knows that and his payoffs are the same, he will choose to defect too.

b)

Pareto optimal outcome is such an outcome where no player can deviate to another strategy without making at least one agent(might be the same agent or different) worse off. Another way to describe it, there's no other outcome where some player is better of without make some player worse off.
If the outcome is not Pareto optimal then this means there's wasted opportunity for improvement, that's why it's important.
All outcomes except (defect, defect) are Pareto optimal. The (defect, defect) is not Pareto optimal because, (cooperate, cooperate) is better for both players. In all other outcome, there's no way to make someone better off without making someone worse off.

c)

The Nash equilibrium in above example is not Pareto optimal outcome, there's room for improvement. If the agent's reveal their negotiation strategies, they can negotiate an outcome where both of them are better off and the total social welfare is maximized. An example of negotiation strategy is "if other player cooperates I will cooperate".

d)

As can be seen from payoff matrix, when both of the players cooperate, both of them have a big incentive to defect. If this happens the player who defected will get best payoff, where as player that cooperated worst payoff. Since both of them know that, cooperation in this case is impossible. They both will choose to defect to be on the safe side and get at least something better than worst payoff. However, if we consider a prisoner dilemma game played over and over again, we could achieve cooperation as was shown by Alexrod's tournament. In this tournament, players with different kinds of strategies competed against each other on who will get the most total payoff over fixed number of games that they played with each other. The best performing strategy was Tit-for-Tat, that preferred cooperation. It started out with cooperation, if it were exploited then it retailed with defection immediately, but if the other player cooperated again it would cooperate again too. 

Ending of May 2008

Resources: 
Lecture slides: http://www.csc.liv.ac.uk/~mjw/pubs/imas/distrib/pdf-index.html
Wiki: Wikipedia
Papers:
Lecture notes: 
Others:
P.S If I violated your copyright, I am sorry. I will remove it on your request.
P.S.S  Some of the link could be broken, I will try to repair that when discovered. But, it's inevitable that they will expire someday and I will be too old to remember where I got them. 

No comments:

Post a Comment