Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Class notes

Class notes ECO548528

Rating
-
Sold
-
Pages
10
Uploaded on
19-04-2025
Written in
2024/2025

All documents in readable format

Institution
Course

Content preview

UNIT – 4: SEMI SUPERVISED LEARNING, REINFORCEMENT LEARNING

MARKOV DECISION PROCESS
Markov Decision Process or MDP, is used to formalize the reinforcement learning problems. If the
environment is completely observable, then its dynamic can be modeled as a Markov Process. In
MDP, the agent constantly interacts with the environment and performs actions; at each action, the
environment responds and generates a new state.




MDP is used to describe the environment for the RL, and almost all the RL problem can be
formalized using MDP. MDP contains a tuple of four elements (S, A, Pa, Ra):
➢ A set of finite States S
➢ A set of finite Actions A
➢ Rewards received after transitioning from state S to state S', due to action a.
➢ Probability Pa.

Markov Property: It says that "If the agent is present in the current state S1, performs an action a1
and move to the state s2, then the state transition from s1 to s2 only depends on the current state and
future action and states do not depend on past actions, rewards, or states."

Or, in other words, as per Markov Property, the current state transition does not depend on any past
action or state. Hence, MDP is an RL problem that satisfies the Markov property. Such as in a Chess
game, the players only focus on the current state and do not need to remember past actions or states.

Finite MDP: A finite MDP is when there are finite states, finite rewards, and finite actions. In RL,
we consider only the finite MDP.

Markov Process: Markov Process is a memoryless process with a sequence of random states S1, S2,
......, St that uses the Markov Property. Markov process is also known as Markov chain, which is a
tuple (S, P) on state S and transition function P. These two components (S and P) can define the
dynamics of the system.

, BELLMAN EQUATION
The Bellman equation was introduced by the Mathematician Richard Ernest Bellman in the year
1953, and hence it is called as a Bellman equation. It is associated with dynamic programming and
used to calculate the values of a decision problem at a certain point by including the values of
previous states.

It is a way of calculating the value functions in dynamic programming or environment that leads to
modern reinforcement learning. The key-elements used in Bellman equations are:
➢ Action performed by the agent is referred to as "a"
➢ State occurred by performing the action is "s."
➢ The reward/feedback obtained for each good and bad action is "R."
➢ A discount factor is Gamma "γ."

The Bellman equation can be written as:
V(s) = max [R(s,a) + γV(s`)]
Where,
V(s)= value calculated at a particular point.
R(s,a) = Reward at a particular state’s by performing an action.
γ = Discount factor
V(s`) = The value at the previous state.

For 1st block:
V(s3) = max [R(s,a) + γV(s`)], here V(s')= 0 because there is no further state to move.
V(s3)= max[R(s,a)]=> V(s3)= max[1]=> V(s3)= 1.

For 2nd block:
V(s2) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 1, and R(s, a)= 0, because there is no reward
at this state.
V(s2)= max[0.9(1)]=> V(s)= max[0.9]=> V(s2) =0.9

For 3rd block:
V(s1) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.9, and R(s, a)= 0, because there is no reward
at this state also.
V(s1)= max[0.9(0.9)]=> V(s3)= max[0.81]=> V(s1) =0.81

For 4th block:
V(s5) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.81, and R(s, a)= 0, because there is no
reward at this state also.
V(s5)= max[0.9(0.81)]=> V(s5)= max[0.81]=> V(s5) =0.73

For 5th block:
V(s9) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.73, and R(s, a)= 0, because there is no
reward at this state also.
V(s9)= max[0.9(0.73)]=> V(s4)= max[0.81]=> V(s4) =0.66

Written for

Institution
Course

Document information

Uploaded on
April 19, 2025
Number of pages
10
Written in
2024/2025
Type
Class notes
Professor(s)
Nitesh
Contains
All classes

Subjects

$8.49
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
radheshyamsulera0104

Get to know the seller

Seller avatar
radheshyamsulera0104 Shekhawati
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
1 year
Number of followers
0
Documents
5
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions