Home / Expert Answers / Computer Science / given-the-grid-world-as-shown-in-figure-a-the-agent-starts-from-point-s-and-must-reach-the-goal-pa186

(Solved): Given the Grid World as shown in Figure (a): The agent starts from point S and must reach the goal ...



Given the Grid World as shown in Figure (a): The agent starts from point S and must reach the goal G(4,6). The gray areas represent walls, which the agent cannot pass through. Each step gives a reward of -1, so the objective is to reach the goal in the shortest path possible. Modify the QLearning.ipynb file as follows: Implement the gray wall areas in the class GridWorld(). Inside the training loop, after a fixed number of iterations (e.g., every 50 or 100 episodes), print a grid that displays the best action (with the highest Q-value) for each state using arrows, as in Figure (b). Leave the wall cells blank, without any arrows. Repeat this process every 50 or 100 iterations and generate about 8 to 10 grids in total. Q-learning Algorithm 9-9: Q-Learning (State-Action Version of TD Control) - Input: Episode generator - Output: Optimal policy \( \hat{\pi} \), optimal state-action value function \( \hat{\mathrm{q}} \) ``` plaintext for(s \in S and a \in A) initialize q(s, a) with arbitrary values. If s is a terminal state, set q(s, a) = 0 repeat Initialize the starting state s. repeat Choose an action a from state s according to q. // apply epsilon-greedy Take action a, observe the next state s' and reward r. // role of MDP q(s, a) = q(s,a) + \rho(r + v max_a' q(s', a') - q(s, a)) // Eq. (9.26) s = s' until (s' is a goal state) until (termination condition is met); q}= ``` Q-learning Flowchart 1. Start 2. Initialize q-values with arbitrary values 3. Begin an episode (set initial state) 4. Select action a in state s using the epsilon-greedy policy 5. Perform action a, transition to next state \( \mathbf{s}^{\prime} \), and receive reward \( \mathbf{r} \) 6. Update q-value of previous state using the update formula 7. Is s' a terminal state? - No \( \rightarrow \) go back to step 4 - Yes \( \rightarrow \) End



We have an Answer from Expert

View Expert Answer

Expert Answer


We have an Answer from Expert

Buy This Answer $5

Place Order

We Provide Services Across The Globe