2023 DeepRacer Student League Reward Function Notes #01 — Summer Cohort

4 min readSep 28, 2023

(This was a series of notes I shared with a friend in Finland, which I believe might be helpful for others as well.)

This is a reward function trained for 1 hour, resulting in finishing the race in 3’36. Although there is still a gap to reach 3 minutes, I believe this reward function has good potential.

This reward function is a combination of three parts, and I will go into detail for each of them later in this note. First, the distance_reward, which the model gets more reward if it is close to the middle of the track. Second, speed_reward that promotes the model to go faster. Lastly, the direction_reward, model will get the maximum reward if it’s heading the same way as the track goes.

distance_reward

This is no different from the default reward function, but instead of having three markers on the track, I try to implement a continuous reward instead of discrete. Here is what I mean:

This way the model can learn the idea of keeping in the center smoothly, without trying to guess where are the markers. It felt like a whole new world to me when I first encountered this idea. The idea can be implemented by the following code.

distance_reward = 1 - (distance_from_center / track_width * 0.5)

If you want to further guide your model to keep it in the center, why not give it way more reward if it is closer to the middle line? Here we have the non-linear functions.

distance_reward = 1 - (distance_from_center / track_width * 0.5) ** 2

(ps. My latest model is still using linear because I don’t think staying in the middle is a must. But the non-linear one might give me a performance boost, need some time to figure it out)

speed_reward

For speed, I did it with the similar idea mentioned above, but with an extra condition. I want the model to keep its speed above 0.8, in order to finish the race on time. To do that, I set the maximum_speed_difference to 0.8, and the implementation for that.

max_speed_diff = 0.2
speed_diff = abs(1.0 - speed)

if speed_diff < max_speed_diff:
    speed_reward = 1 - (speed_diff / max_speed_diff) ** 2
else:
    speed_reward = 0.001

direction_reward

This one is kind of tricky, but once you understand it, it’s super simple. First, we get the closest waypoint to our car on the track, one is in the front, the other in behind. We can calculate the direction of the track using these two waypoints with some math functions. Then, we compare the track direction with the direction the car is heading toward. With the concept of continuous, the direction_reward would be something like this.

prev_point = waypoints[closest_waypoints[0]]
next_point = waypoints[closest_waypoints[1]]
track_direction = math.atan2((next_point[1] - prev_point[1]), (next_point[0] - prev_point[0]))
track_direction = math.degrees(track_direction)

# heading can be obtain from params['heading']
direction_diff = abs(track_direction - heading)
    
direction_reward = 1 - (direction_diff / 180)**5

(ps. I did it with **5 is because I don’t want to punish the model too hard if it is just going slightly off.)

Reward Summary

In the end, we have to combine all three rewards. Here are my models so far.

0.2 distance + 0.6 speed + 0.2 direction (trained for 30 mins, results in 4’20)
This results in the worst model I have. It keeps running out of track on specific corners. The possible explanation might be focusing too much on speed, the model just won’t slow down.
0.2 distance + 0.3 speed + 0.5 direction (trained for 60 mins, results in 3’36)
This model has never gone out of track, which is good. Unfortunately, this model couldn’t finish the race within the time limit.

Future Work

progress reward
speed up in straight lines, can be achieved in two ways

conditional reward function. If the car is heading toward the direction of the track, weight more on speed. Otherwise, weight more on direction.
Figure out all the straight waypoints, and have different reward function when the car is on those points.