Exam (elaborations)

CS 7643 – Quiz 6 Review | Questions and Answers – spring 2026 | 100% Correct – GT.

Rating

Sold

Pages

Grade

A+

Uploaded on

03-03-2026

Written in

2025/2026

CS 7643 – Quiz 6 Review | Questions and Answers – spring 2026 | 100% Correct – GT.

Institution

Course

Content preview

CS 7643 – Quiz 6 Review | Questions and Answers – spring 2026 | 100% Correct
– GT.

🔵 SECTION 1: Markov Decision Processes
(MDPs)

Q1. Define a Markov Decision Process (MDP).

Answer:

An MDP is a tuple:

(S,A,T,R,γ)(S, A, T, R, \gamma)(S,A,T,R,γ)

Where:

 S = set of states
 A = set of actions
 T(s'|s,a) = transition probability
 R(s,a) = reward function
 γ ∈ [0,1] = discount factor

Markov property:

P(St+1∣St)=P(St+1∣St,...,S0)P(S_{t+1}|S_t) = P(S_{t+1}|S_t,...,S_0)P(St+1∣St)=P(St+1∣St,...,S0)

Only the present state matters.

Q2. Define Value Function and Q-Function.

Answer:

Value function:

Vπ(s)=E[∑t=0∞γtrt∣s0=s]V^\pi(s) = E\left[\sum_{t=0}^{\infty} \gamma^t r_t \mid s_0=s
\right]Vπ(s)=E[t=0∑∞γtrt∣s0=s]

Q-function:

,Qπ(s,a)=E[∑t=0∞γtrt∣s0=s,a0=a]Q^\pi(s,a) = E\left[\sum_{t=0}^{\infty} \gamma^t r_t \mid
s_0=s, a_0=a \right]Qπ(s,a)=E[t=0∑∞γtrt∣s0=s,a0=a]

Optimal value:

V∗(s)=max⁡aQ∗(s,a)V^*(s) = \max_a Q^*(s,a)V∗(s)=amaxQ∗(s,a)

Q3. Write the Bellman Optimality Equation.

V∗(s)=max⁡a[r(s,a)+γ∑s′P(s′∣s,a)V∗(s′)]V^*(s) = \max_a \left[r(s,a) + \gamma \sum_{s'}
P(s'|s,a)V^*(s') \right]V∗(s)=amax[r(s,a)+γs′∑P(s′∣s,a)V∗(s′)]

Breaks value into:

 Immediate reward
 Discounted future value

🔵 SECTION 2: Dynamic Programming

Q4. Explain Value Iteration.

Answer:

Initialize arbitrary V(s)V(s)V(s)

Iteratively update:

V(s)←max⁡a[r(s,a)+γ∑s′P(s′∣s,a)V(s′)]V(s) \leftarrow \max_a \left[r(s,a) + \gamma \sum_{s'}
P(s'|s,a)V(s') \right]V(s)←amax[r(s,a)+γs′∑P(s′∣s,a)V(s′)]

Repeat until convergence.

Q5. Explain Policy Iteration.

Answer:

1. Initialize random policy π

, 2. Policy Evaluation:

Vπ(s)V^\pi(s)Vπ(s)

3. Policy Improvement:

π(s)=arg⁡max⁡aQπ(s,a)\pi(s) = \arg\max_a Q^\pi(s,a)π(s)=argamaxQπ(s,a)

4. Repeat until stable

🔵 SECTION 3: Q-Learning & Deep Q-
Learning

Q6. Write Q-learning update rule.

Q(s,a)←Q(s,a)+α[r+γmax⁡aQ(s′,a)−Q(s,a)]Q(s,a) \leftarrow Q(s,a) + \alpha \left[r + \gamma
\max_a Q(s',a) - Q(s,a) \right]Q(s,a)←Q(s,a)+α[r+γamaxQ(s′,a)−Q(s,a)]

Off-policy TD learning.

Q7. Compute Q-update (GT-style numeric problem).

Given:

 γ = 0.8
 State B, action Up
 Enter state C
 Reward = 3
 max Q(C,a) = 5
 Q(B,Up) = 8

Update:

Q(B,Up)=8+(3+0.8∗5−8)Q(B,Up) = 8 + (3 + 0.8*5 - 8)Q(B,Up)=8+(3+0.8∗5−8) =8+(3+4−8)= 8
+ (3 + 4 - 8)=8+(3+4−8) =7= 7=7

MSE Loss:

Report Copyright Violation

Written for

Institution: Georgia Institute Of Technology
Course: CS 7643

All documents for this subject (50)

Document information

Uploaded on: March 3, 2026
Number of pages: 18
Written in: 2025/2026
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

cs 7643 quiz 6 review
cs 7643 quiz 6 review questions and answers
cs 7643 quiz 6 review questions and answers

$17.99

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

Wiseman

3.9

(1611)

Also available in package deal

Get to know the seller

Wiseman NURSING

View profile

Sold

7916

Member since

4 year

Number of followers

3879

Documents

29410

Last sold

4 hours ago

Premier Academic Solutions

3.9

1611 reviews

786

294

250

189

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Wiseman. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $17.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 57108 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

CS 7643 – Quiz 6 Review | Questions and Answers – spring 2026 | 100% Correct – GT.

Content preview

Written for

Document information

Subjects

Also available in package deal

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?