Exam (elaborations)

CS 234 assignment 1-ALL ANSWERS 100% CORRECT

Name: CS 234 assignment 1-ALL ANSWERS 100% CORRECT
SKU: doc_1624024
Rating: 5.00 (1 reviews)
Author: Themanehoppe

Rating

5.0

(1)

Sold

Pages

Grade

A+

Uploaded on

17-03-2022

Written in

2021/2022

Show more Read less

Institution

Course

Content preview

CS 234 Winter 2021
Assignment 1
Due: January 22 at 6:00 pm (PST)

For submission instructions please refer to website For all problems, if you use an existing result
from either the literature or a textbook to solve the exercise, you need to cite the source.

1 Flappy Karel MDP [25 pts]
There is a hot new mobile game on the market called Flappy Karel, where Karel the robot must
dodge the red pillars of doom and flap its way to the green pasture. Consider the following 2 grid
environments (Flappy World 1 and Flappy World 2). Starting from any unshaded square, Karel can
either move right & up, or right & down (e.g from state 4 you can move to state 10 or 12, think
checkers). Actions are deterministic and always succeed unless they will cause Karel to run into a
wall. The thicker edges indicate walls, and attempting to move in the direction of a wall results in
falling down one square (e.g. going in any direction from state 30 leads to falling into state 31). A
successful run by Karel in Flappy World 1 is shown in Figure 1b. Taking any action from the green
target squares (no. 32) earns a reward of rg and ends the episode. Taking any action from the red
squares of doom (no. 1, 7, 8, 12, 13...) earns a reward of rr and ends the episode. Otherwise, from
every other square, taking any action is associated with a reward rs . Assume the discount factor
γ = 0.9, rg = +5, and rr = −5 unless otherwise specified. Notice the Horizon is technically infinite
in both worlds.

1 8 15 22 29 1 8 15 22 29

2 9 16 23 30 2 9 16 23 30

3 10 17 24 31 3 10 17 24 31

4 11 18 25 32 4 11 18 25 32

5 12 19 26 33 5 12 19 26 33

6 13 20 27 34 6 13 20 27 34

7 14 21 28 35 7 14 21 28 35

(a) Flappy World 1 (b) A successful run by Karel in Flappy World 1

Figure 1

1

, (a) Let rs ∈ {−4, −1, 0, 1}. Starting in square 2, for each of the possible values of rs briefly
explain what the optimal policy would be in Flappy World 1. In each case is the optimal policy
unique and does the optimal policy depend on the value of the discount factor γ? Explain
your answer. [5 pts]
(b) What value of rs would cause the optimal policy to return the shortest path to the green
target square? Using this value of rs find the optimal value function for each square in Flappy
world 1. What is the optimal action from square 27? [5 pts]

Now consider Flappy world 2. It is the same as Flappy world 1, except there are no walls on the
right and left sides. Going past the right end of flappy world 2 simply loops you to left hand side.
Take a look at Figure 1b for a successful run by Karel in Flappy World 2.

1 8 15 22 29
1 8 15 22 29 1 8 15 22 29

2 9 16 23 30 2 9 16 23 30 2 9 16 23 30

3 10 17 24 31 3 10 17 24 31 3 10 17 24 31

4 11 18 25 32 4 11 18 25 32
4 11 18 25 32
5 12 19 26 33 5 12 19 26 33

5 12 19 26 33
6 13 20 27 34 6 13 20 27 34

6 13 20 27 34 7 14 21 28 35 7 14 21 28 35

7 14 21 28 35
(b) A successful run by Karel in Flappy World 2
(a) Flappy World 2

Figure 2

(c) Let rs ∈ {−4, −1, 0, 1}. Starting in square 3, for each of the possible values of rs briefly
explain what the optimal policy would be in Flappy World 2. Using the value of rs , that
would cause the optimal policy to return the shortest path to the green target square, find the
optimal value function for each square in Flappy world 2. What is the optimal action from
square 27? [5 pts]
(d) Consider a general MDP with rewards, and transitions. Consider a discount factor of γ. For
this case assume that the horizon is infinite (so there is no termination). A policy π in this
MDP induces a value function V π (lets refer to this as Vold
π ). Now suppose we have the same

MDP where all rewards have a constant c added to them and then have been scaled by a
constant a (i.e. rnew = a(c + rold )). Can you come up with an expression for the new value
function V π induced by π in this second MDP in terms of Vold π , c, a, and γ? [5 pts]

(e) Can scaling all the rewards by a fixed amount change the optimal policy of a MDP? If so,
describe how different ranges of the constant a (where rnew = a ∗ (rold )) would change the
optimal policy of the MDP from part (c). [5 pts]

(a) rs = 1 Take longest possible path (2,10,18,24,30,31,32) to target square.
rs = 0 Take shortest possible path to target square.
rs = −1 Take shortest possible path to target square.

2

Report Copyright Violation

Written for

Course: CS 234

All documents for this subject (16)

Document information

Uploaded on: March 17, 2022
Number of pages: 9
Written in: 2021/2022
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

stanford universitycs 234assignment1

$10.49

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

Themanehoppe

3.5

(55)

Reviews from verified buyers

Showing all reviews

isheynzon Accounting and Finance · 2 reviews

3 year ago

5.0

1 reviews

Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Themanehoppe American Intercontinental University Online

View profile

Sold

345

Member since

4 year

Number of followers

224

Documents

3784

Last sold

21 hours ago

3.5

55 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Themanehoppe. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $10.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45321 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

CS 234 assignment 1-ALL ANSWERS 100% CORRECT

Content preview

Written for

Document information

Subjects

Reviews from verified buyers

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?