Answers|2025 Update|100% Correct-GT 5|New
49 Questions and Answers|2025 Update|100%
Correct-GT
Reinforcement learning - answer-Sequential decision making in an
environment with evaluative feedback
Environment: may be unknown, non-linear, stochastic and complex
Agent: learns a policy to map states of the environments to actions
- seeks to maximize long-term reward
RL: Evaluative Feedback - answer-- Pick an action, receive a reward
- No supervision for what the correct action is or would have been (unlike
supervised learning)
RL: Sequential Decisions - answer-- Plan and execution actions over a sequence
of states
- Reward may be delayed, requiring optimization of future rewards (long-term
planning)
, Signature Challenges in RL - answer-Evaluative Feedback: Need trial and error
to find the right action
REINFORCE Algorithm - answer-1. Gather data using current policy
- Sample trajectories t by acting according to PI
2. Compute the gradient update
- sum(delta_theta * pi_theta(a_t | s_t)) * sum(R(s_t, a_t))
3. Update Policy Parameters
- theta = theta + alpha*(policy gradient)
- Weak augmentation isn't so severe that the pseudo-labels are bad
- Using strong augmentation to make the NN learn better feature
representations
Meta-Learning (Few-Shot Learning) - answer-- Learning to learn
- Learn NN initialization that after it perform SGD steps on small amounts of
labeled data, you learn an effective initialization
Surrogate Tasks (Self-Supervised Learning) - answer-- Identify loss functions for
tasks we don't care about, but allow us to learn good feature representations