Class notes

Class notes ECO548528

Rating

Sold

Pages

Uploaded on

19-04-2025

Written in

2024/2025

All documents in readable format

Institution

Course

Content preview

UNIT – 4: SEMI SUPERVISED LEARNING, REINFORCEMENT LEARNING

MARKOV DECISION PROCESS
Markov Decision Process or MDP, is used to formalize the reinforcement learning problems. If the
environment is completely observable, then its dynamic can be modeled as a Markov Process. In
MDP, the agent constantly interacts with the environment and performs actions; at each action, the
environment responds and generates a new state.

MDP is used to describe the environment for the RL, and almost all the RL problem can be
formalized using MDP. MDP contains a tuple of four elements (S, A, Pa, Ra):
➢ A set of finite States S
➢ A set of finite Actions A
➢ Rewards received after transitioning from state S to state S', due to action a.
➢ Probability Pa.

Markov Property: It says that "If the agent is present in the current state S1, performs an action a1
and move to the state s2, then the state transition from s1 to s2 only depends on the current state and
future action and states do not depend on past actions, rewards, or states."

Or, in other words, as per Markov Property, the current state transition does not depend on any past
action or state. Hence, MDP is an RL problem that satisfies the Markov property. Such as in a Chess
game, the players only focus on the current state and do not need to remember past actions or states.

Finite MDP: A finite MDP is when there are finite states, finite rewards, and finite actions. In RL,
we consider only the finite MDP.

Markov Process: Markov Process is a memoryless process with a sequence of random states S1, S2,
......, St that uses the Markov Property. Markov process is also known as Markov chain, which is a
tuple (S, P) on state S and transition function P. These two components (S and P) can define the
dynamics of the system.

, BELLMAN EQUATION
The Bellman equation was introduced by the Mathematician Richard Ernest Bellman in the year
1953, and hence it is called as a Bellman equation. It is associated with dynamic programming and
used to calculate the values of a decision problem at a certain point by including the values of
previous states.

It is a way of calculating the value functions in dynamic programming or environment that leads to
modern reinforcement learning. The key-elements used in Bellman equations are:
➢ Action performed by the agent is referred to as "a"
➢ State occurred by performing the action is "s."
➢ The reward/feedback obtained for each good and bad action is "R."
➢ A discount factor is Gamma "γ."

The Bellman equation can be written as:
V(s) = max [R(s,a) + γV(s`)]
Where,
V(s)= value calculated at a particular point.
R(s,a) = Reward at a particular state’s by performing an action.
γ = Discount factor
V(s`) = The value at the previous state.

For 1st block:
V(s3) = max [R(s,a) + γV(s`)], here V(s')= 0 because there is no further state to move.
V(s3)= max[R(s,a)]=> V(s3)= max[1]=> V(s3)= 1.

For 2nd block:
V(s2) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 1, and R(s, a)= 0, because there is no reward
at this state.
V(s2)= max[0.9(1)]=> V(s)= max[0.9]=> V(s2) =0.9

For 3rd block:
V(s1) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.9, and R(s, a)= 0, because there is no reward
at this state also.
V(s1)= max[0.9(0.9)]=> V(s3)= max[0.81]=> V(s1) =0.81

For 4th block:
V(s5) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.81, and R(s, a)= 0, because there is no
reward at this state also.
V(s5)= max[0.9(0.81)]=> V(s5)= max[0.81]=> V(s5) =0.73

For 5th block:
V(s9) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.73, and R(s, a)= 0, because there is no
reward at this state also.
V(s9)= max[0.9(0.73)]=> V(s4)= max[0.81]=> V(s4) =0.66

Report Copyright Violation

Written for

Institution: Bikaner Technical Universityy
Course: ECO548528

All documents for this subject (5)

Document information

Uploaded on: April 19, 2025
Number of pages: 10
Written in: 2024/2025
Type: Class notes
Professor(s): Nitesh
Contains: All classes

Subjects

engineering
ml
ai

$8.49

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

radheshyamsulera0104

Get to know the seller

radheshyamsulera0104 Shekhawati

View profile

Sold

Member since

1 year

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller radheshyamsulera0104. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 47251 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Class notes ECO548528

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?