CS 7643 / CS7643 QUIZ 5 (LATEST UPDATE 2025 /
2026) DEEP LEARNING | QUESTIONS & ANSWERS |
GRADE A | 100% CORRECT - GEORGIA TECH
Reinforcement learning .....ANSWER.....Sequential decision
making in an environment with evaluative feedback
Environment: may be unknown, non-linear, stochastic and complex
Agent: learns a policy to map states of the environments to
actions
- seeks to maximize long-term reward
RL: Evaluative Feedback .....ANSWER.....- Pick an action, receive
a reward
- No supervision for what the correct action is or would have
been (unlike supervised learning)
,Page 2 of 23
RL: Sequential Decisions .....ANSWER.....- Plan and execution
actions over a sequence of states
- Reward may be delayed, requiring optimization of future
rewards (long-term planning)
Signature Challenges in RL .....ANSWER.....Evaluative Feedback:
Need trial and error to find the right action
Delayed Feedback: Actions may not lead to immediate reward
Non-stationarity: Data distribution of visited states changes when
the policy changes
Fleeting Nature: of online data (may only see data once)
MDP .....ANSWER.....Framework underlying RL
, Page 3 of 23
S: Set of states
A: Set of actions
R: Distribution of Rewards
T: Transition probabiliity
y: Discount property
Markov Property: Current state completely characterizes state of
the environment
RL: Equations relating optimal quantities .....ANSWER.....1. V*(S)
= max_a(Q*(s, a)
2. PI*(s) = argmax_a(Q*(s, a)
V*(S) .....ANSWER.....max_a (sum_(s') { p(s'|s, a) [r(s, a) + yV*(s')]
})