Project: Train a Quadcopter How to Fly

Design an agent to fly a quadcopter, and then train it using a reinforcement learning algorithm of your choice!

Try to apply the techniques you have learnt, but also feel free to come up with innovative ideas and test them.

Instructions

Take a look at the files in the directory to better understand the structure of the project.

  • task.py: Define your task (environment) in this file.
  • agents/: Folder containing reinforcement learning agents.
    • policy_search.py: A sample agent has been provided here.
    • agent.py: Develop your agent here.
  • physics_sim.py: This file contains the simulator for the quadcopter. DO NOT MODIFY THIS FILE.

For this project, you will define your own task in task.py. Although we have provided a example task to get you started, you are encouraged to change it. Later in this notebook, you will learn more about how to amend this file.

You will also design a reinforcement learning agent in agent.py to complete your chosen task.

You are welcome to create any additional files to help you to organize your code. For instance, you may find it useful to define a model.py file defining any needed neural network architectures.

Controlling the Quadcopter

We provide a sample agent in the code cell below to show you how to use the sim to control the quadcopter. This agent is even simpler than the sample agent that you'll examine (in agents/policy_search.py) later in this notebook!

The agent controls the quadcopter by setting the revolutions per second on each of its four rotors. The provided agent in the Basic_Agent class below always selects a random action for each of the four rotors. These four speeds are returned by the act method as a list of four floating-point numbers.

For this project, the agent that you will implement in agents/agent.py will have a far more intelligent method for selecting actions!

In [1]:
import random

class Basic_Agent():
    def __init__(self, task):
        self.task = task
    
    def act(self):
        new_thrust = random.gauss(450., 25.)
        return [new_thrust + random.gauss(0., 1.) for x in range(4)]

Run the code cell below to have the agent select actions to control the quadcopter.

Feel free to change the provided values of runtime, init_pose, init_velocities, and init_angle_velocities below to change the starting conditions of the quadcopter.

The labels list below annotates statistics that are saved while running the simulation. All of this information is saved in a text file data.txt and stored in the dictionary results.

In [2]:
%load_ext autoreload
%autoreload 2

import csv
import numpy as np
from task import Task

# Modify the values below to give the quadcopter a different starting position.
runtime = 5.                                     # time limit of the episode
init_pose = np.array([0., 0., 10., 0., 0., 0.])  # initial pose
init_velocities = np.array([0., 0., 0.])         # initial velocities
init_angle_velocities = np.array([0., 0., 0.])   # initial angle velocities
file_output = 'data.txt'                         # file name for saved results

# Setup
task = Task(init_pose, init_velocities, init_angle_velocities, runtime)
agent = Basic_Agent(task)
done = False
labels = ['time', 'x', 'y', 'z', 'phi', 'theta', 'psi', 'x_velocity',
          'y_velocity', 'z_velocity', 'phi_velocity', 'theta_velocity',
          'psi_velocity', 'rotor_speed1', 'rotor_speed2', 'rotor_speed3', 'rotor_speed4']
results = {x : [] for x in labels}

# Run the simulation, and save the results.
with open(file_output, 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(labels)
    while True:
        rotor_speeds = agent.act()
        _, _, done = task.step(rotor_speeds)
        to_write = [task.sim.time] + list(task.sim.pose) + list(task.sim.v) + list(task.sim.angular_v) + list(rotor_speeds)
        for ii in range(len(labels)):
            results[labels[ii]].append(to_write[ii])
        writer.writerow(to_write)
        if done:
            break

Run the code cell below to visualize how the position of the quadcopter evolved during the simulation.

In [3]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(results['time'], results['x'], label='x')
plt.plot(results['time'], results['y'], label='y')
plt.plot(results['time'], results['z'], label='z')
plt.legend()
_ = plt.ylim()

The next code cell visualizes the velocity of the quadcopter.

In [4]:
plt.plot(results['time'], results['x_velocity'], label='x_hat')
plt.plot(results['time'], results['y_velocity'], label='y_hat')
plt.plot(results['time'], results['z_velocity'], label='z_hat')
plt.legend()
_ = plt.ylim()

Next, you can plot the Euler angles (the rotation of the quadcopter over the $x$-, $y$-, and $z$-axes),

In [5]:
plt.plot(results['time'], results['phi'], label='phi')
plt.plot(results['time'], results['theta'], label='theta')
plt.plot(results['time'], results['psi'], label='psi')
plt.legend()
_ = plt.ylim()

before plotting the velocities (in radians per second) corresponding to each of the Euler angles.

In [6]:
plt.plot(results['time'], results['phi_velocity'], label='phi_velocity')
plt.plot(results['time'], results['theta_velocity'], label='theta_velocity')
plt.plot(results['time'], results['psi_velocity'], label='psi_velocity')
plt.legend()
_ = plt.ylim()

Finally, you can use the code cell below to print the agent's choice of actions.

In [7]:
plt.plot(results['time'], results['rotor_speed1'], label='Rotor 1 revolutions / second')
plt.plot(results['time'], results['rotor_speed2'], label='Rotor 2 revolutions / second')
plt.plot(results['time'], results['rotor_speed3'], label='Rotor 3 revolutions / second')
plt.plot(results['time'], results['rotor_speed4'], label='Rotor 4 revolutions / second')
plt.legend()
_ = plt.ylim()

When specifying a task, you will derive the environment state from the simulator. Run the code cell below to print the values of the following variables at the end of the simulation:

  • task.sim.pose (the position of the quadcopter in ($x,y,z$) dimensions and the Euler angles),
  • task.sim.v (the velocity of the quadcopter in ($x,y,z$) dimensions), and
  • task.sim.angular_v (radians/second for each of the three Euler angles).
In [8]:
# the pose, velocity, and angular velocity of the quadcopter at the end of the episode
print(task.sim.pose)
print(task.sim.v)
print(task.sim.angular_v)
[ 21.91648641   9.83504189  25.50352072   0.40012246   5.36300823   0.        ]
[ 13.87484575   5.6141298   -1.29935918]
[ 0.02636665 -0.27910436  0.        ]

In the sample task in task.py, we use the 6-dimensional pose of the quadcopter to construct the state of the environment at each timestep. However, when amending the task for your purposes, you are welcome to expand the size of the state vector by including the velocity information. You can use any combination of the pose, velocity, and angular velocity - feel free to tinker here, and construct the state to suit your task.

The Task

A sample task has been provided for you in task.py. Open this file in a new window now.

The __init__() method is used to initialize several variables that are needed to specify the task.

  • The simulator is initialized as an instance of the PhysicsSim class (from physics_sim.py).
  • Inspired by the methodology in the original DDPG paper, we make use of action repeats. For each timestep of the agent, we step the simulation action_repeats timesteps. If you are not familiar with action repeats, please read the Results section in the DDPG paper.
  • We set the number of elements in the state vector. For the sample task, we only work with the 6-dimensional pose information. To set the size of the state (state_size), we must take action repeats into account.
  • The environment will always have a 4-dimensional action space, with one entry for each rotor (action_size=4). You can set the minimum (action_low) and maximum (action_high) values of each entry here.
  • The sample task in this provided file is for the agent to reach a target position. We specify that target position as a variable.

The reset() method resets the simulator. The agent should call this method every time the episode ends. You can see an example of this in the code cell below.

The step() method is perhaps the most important. It accepts the agent's choice of action rotor_speeds, which is used to prepare the next state to pass on to the agent. Then, the reward is computed from get_reward(). The episode is considered done if the time limit has been exceeded, or the quadcopter has travelled outside of the bounds of the simulation.

In the next section, you will learn how to test the performance of an agent on this task.

The Agent

The sample agent given in agents/policy_search.py uses a very simplistic linear policy to directly compute the action vector as a dot product of the state vector and a matrix of weights. Then, it randomly perturbs the parameters by adding some Gaussian noise, to produce a different policy. Based on the average reward obtained in each episode (score), it keeps track of the best set of parameters found so far, how the score is changing, and accordingly tweaks a scaling factor to widen or tighten the noise.

Run the code cell below to see how the agent performs on the sample task.

In [9]:
import sys
import pandas as pd
from agents.policy_search import PolicySearch_Agent
from task import Task

num_episodes = 1000
target_pos = np.array([0., 0., 10.])
task = Task(target_pos=target_pos)
agent = PolicySearch_Agent(task) 

for i_episode in range(1, num_episodes+1):
    state = agent.reset_episode() # start a new episode
    while True:
        action = agent.act(state) 
        next_state, reward, done = task.step(action)
        agent.step(reward, done)
        state = next_state
        if done:
            print("\rEpisode = {:4d}, score = {:7.3f} (best = {:7.3f}), noise_scale = {}".format(
                i_episode, agent.score, agent.best_score, agent.noise_scale), end="")  # [debug]
            break
    sys.stdout.flush()
Episode = 1000, score =  -2.127 (best =  -0.085), noise_scale = 3.2625

This agent should perform very poorly on this task. And that's where you come in!

Define the Task, Design the Agent, and Train Your Agent!

Amend task.py to specify a task of your choosing. If you're unsure what kind of task to specify, you may like to teach your quadcopter to takeoff, hover in place, land softly, or reach a target pose.

After specifying your task, use the sample agent in agents/policy_search.py as a template to define your own agent in agents/agent.py. You can borrow whatever you need from the sample agent, including ideas on how you might modularize your code (using helper methods like act(), learn(), reset_episode(), etc.).

Note that it is highly unlikely that the first agent and task that you specify will learn well. You will likely have to tweak various hyperparameters and the reward function for your task until you arrive at reasonably good behavior.

As you develop your agent, it's important to keep an eye on how it's performing. Use the code above as inspiration to build in a mechanism to log/save the total rewards obtained in each episode to file. If the episode rewards are gradually increasing, this is an indication that your agent is learning.

In [17]:
import numpy as np
import sys
from agents.agent import Agent
from task import Task

num_episodes = 500
# Take off task
# Quadcopter stands still at the ground and has as target a height of 100 above the sarting point
init_pos = np.array([0., 0., 0., 0., 0., 0.])
target_pos = np.array([0., 0., 100.])
task = Task(init_pose=init_pos, target_pos=target_pos,runtime=10.)
agent = Agent(task)
# save rewards for plotting
rewards = []

for i_episode in range(1, num_episodes+1):
    state = agent.reset_episode() # start a new episode
    step = 0
    while True:
        step +=1
        action = agent.act(state)
        next_state, reward, done = task.step(action)
        agent.step(action, reward, next_state, done)
        state = next_state
        if done:
            rewards.append(agent.score)
            print("\r\nEp={:4d}, score={:7.3f} (best={:7.3f}) pos={} {} {}".format(
                i_episode,
                agent.score,
                agent.best_score,
                round(task.sim.pose[:3][0],2),
                round(task.sim.pose[:3][1],2),
                round(task.sim.pose[:3][2],2)), end="")  # [debug]
            break
    sys.stdout.flush()
Using TensorFlow backend.
Ep=   1, score=2640.188 (best=2640.188) pos=-1.7 0.65 0.0
Ep=   2, score=2881.357 (best=2881.357) pos=-0.99 1.68 0.0
Ep=   3, score=2640.188 (best=2881.357) pos=-1.7 0.65 0.0
Ep=   4, score=2881.357 (best=2881.357) pos=-0.99 1.68 0.0
Ep=   5, score=2640.188 (best=2881.357) pos=-1.7 0.65 0.0
Ep=   6, score=3121.370 (best=3121.370) pos=-1.14 1.8 0.0
Ep=   7, score=2880.923 (best=3121.370) pos=-0.18 0.49 0.0
Ep=   8, score=959.846 (best=3121.370) pos=-0.07 -0.0 0.0
Ep=   9, score=1199.799 (best=3121.370) pos=-0.11 -0.0 0.0
Ep=  10, score=719.853 (best=3121.370) pos=-0.08 -0.0 0.0
Ep=  11, score=719.852 (best=3121.370) pos=0.0 0.0 0.0
Ep=  12, score=3842.895 (best=3842.895) pos=-0.39 0.26 0.0
Ep=  13, score=719.891 (best=3842.895) pos=-0.05 -0.0 0.0
Ep=  14, score=719.932 (best=3842.895) pos=-0.25 -0.0 0.0
Ep=  15, score=11998.761 (best=11998.761) pos=150.0 -6.8 119.88
Ep=  16, score=20175.560 (best=20175.560) pos=150.0 -78.58 300.0
Ep=  17, score=72225.119 (best=72225.119) pos=65.04 -6.53 257.44
Ep=  18, score=77803.461 (best=77803.461) pos=7.7 3.22 268.17
Ep=  19, score=78798.494 (best=78798.494) pos=4.08 1.33 268.3
Ep=  20, score=78798.038 (best=78798.494) pos=2.28 -3.77 268.29
Ep=  21, score=78796.936 (best=78798.494) pos=-0.98 -0.26 268.35
Ep=  22, score=78798.519 (best=78798.519) pos=3.79 2.01 268.29
Ep=  23, score=78796.395 (best=78798.519) pos=1.49 -0.16 268.34
Ep=  24, score=78797.767 (best=78798.519) pos=-1.83 1.63 268.33
Ep=  25, score=78797.513 (best=78798.519) pos=2.62 1.21 268.32
Ep=  26, score=78796.854 (best=78798.519) pos=1.92 0.63 268.34
Ep=  27, score=78797.229 (best=78798.519) pos=-2.12 -1.19 268.34
Ep=  28, score=78799.678 (best=78799.678) pos=-4.75 1.0 268.27
Ep=  29, score=78796.349 (best=78799.678) pos=0.99 -1.28 268.34
Ep=  30, score=78797.425 (best=78799.678) pos=-2.2 -1.56 268.33
Ep=  31, score=78797.999 (best=78799.678) pos=-0.93 2.5 268.33
Ep=  32, score=78796.809 (best=78799.678) pos=1.81 -2.0 268.33
Ep=  33, score=78797.312 (best=78799.678) pos=-2.36 -0.56 268.33
Ep=  34, score=78797.026 (best=78799.678) pos=0.13 1.13 268.35
Ep=  35, score=78796.743 (best=78799.678) pos=0.41 -1.86 268.34
Ep=  36, score=78797.263 (best=78799.678) pos=0.96 2.38 268.34
Ep=  37, score=78797.478 (best=78799.678) pos=2.67 1.11 268.33
Ep=  38, score=78797.120 (best=78799.678) pos=-0.25 -2.46 268.33
Ep=  39, score=78797.829 (best=78799.678) pos=-0.81 -3.35 268.32
Ep=  40, score=78796.943 (best=78799.678) pos=0.34 1.42 268.34
Ep=  41, score=78797.881 (best=78799.678) pos=-3.19 -1.28 268.31
Ep=  42, score=78796.926 (best=78799.678) pos=0.58 -2.48 268.33
Ep=  43, score=78797.929 (best=78799.678) pos=-2.0 -2.76 268.31
Ep=  44, score=78797.004 (best=78799.678) pos=1.17 0.8 268.34
Ep=  45, score=78798.465 (best=78799.678) pos=-0.19 3.76 268.3
Ep=  46, score=78797.429 (best=78799.678) pos=0.64 -3.44 268.31
Ep=  47, score=78797.081 (best=78799.678) pos=-1.09 -1.76 268.34
Ep=  48, score=78796.589 (best=78799.678) pos=0.02 -1.33 268.35
Ep=  49, score=78796.347 (best=78799.678) pos=0.46 -0.31 268.35
Ep=  50, score=78797.048 (best=78799.678) pos=-1.89 -0.49 268.34
Ep=  51, score=78797.371 (best=78799.678) pos=-0.53 1.12 268.35
Ep=  52, score=78796.678 (best=78799.678) pos=0.81 -2.33 268.33
Ep=  53, score=78796.619 (best=78799.678) pos=0.31 0.85 268.35
Ep=  54, score=78796.530 (best=78799.678) pos=1.36 -1.57 268.34
Ep=  55, score=78798.400 (best=78799.678) pos=-4.05 -0.51 268.29
Ep=  56, score=77801.797 (best=78799.678) pos=0.56 -7.42 268.19
Ep=  57, score=78796.366 (best=78799.678) pos=0.49 -1.64 268.34
Ep=  58, score=78797.306 (best=78799.678) pos=-2.23 -1.08 268.33
Ep=  59, score=78797.710 (best=78799.678) pos=-0.34 2.62 268.33
Ep=  60, score=78796.996 (best=78799.678) pos=-0.09 -2.34 268.33
Ep=  61, score=78796.826 (best=78799.678) pos=-0.89 -0.85 268.35
Ep=  62, score=78796.878 (best=78799.678) pos=1.15 0.74 268.34
Ep=  63, score=78797.222 (best=78799.678) pos=0.31 -2.94 268.32
Ep=  64, score=78797.048 (best=78799.678) pos=2.77 -1.41 268.32
Ep=  65, score=78797.125 (best=78799.678) pos=0.38 -3.17 268.32
Ep=  66, score=78797.471 (best=78799.678) pos=-1.05 1.46 268.34
Ep=  67, score=78796.837 (best=78799.678) pos=-0.75 -0.14 268.35
Ep=  68, score=78797.752 (best=78799.678) pos=-1.25 1.87 268.34
Ep=  69, score=78797.340 (best=78799.678) pos=1.34 2.07 268.34
Ep=  70, score=80387.577 (best=80387.577) pos=-11.3 0.31 213.91
Ep=  71, score=2160.744 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  72, score=2160.776 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  73, score=2160.776 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  74, score=2160.772 (best=80387.577) pos=-1.61 0.0 0.0
Ep=  75, score=2160.776 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  76, score=2160.761 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  77, score=2160.777 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  78, score=2160.776 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  79, score=2160.776 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  80, score=2160.777 (best=80387.577) pos=-1.61 0.0 0.0
Ep=  81, score=2160.777 (best=80387.577) pos=-1.61 0.0 0.0
Ep=  82, score=2160.776 (best=80387.577) pos=-1.61 0.0 0.0
Ep=  83, score=2160.777 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  84, score=2160.777 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  85, score=2160.743 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  86, score=2160.771 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  87, score=2160.776 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  88, score=2160.743 (best=80387.577) pos=-1.61 0.0 0.0
Ep=  89, score=2160.777 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  90, score=2160.776 (best=80387.577) pos=-1.61 0.0 0.0
Ep=  91, score=2160.776 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  92, score=2160.758 (best=80387.577) pos=-1.61 0.0 0.0
Ep=  93, score=2160.777 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  94, score=2160.776 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  95, score=2160.755 (best=80387.577) pos=-1.61 0.0 0.0
Ep=  96, score=2160.776 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  97, score=2160.743 (best=80387.577) pos=-1.61 0.0 0.0
Ep=  98, score=2160.777 (best=80387.577) pos=-1.61 -0.0 0.0
Ep=  99, score=2160.761 (best=80387.577) pos=-1.61 0.0 0.0
Ep= 100, score=2160.772 (best=80387.577) pos=-1.61 -0.0 0.0
Ep= 101, score=2160.744 (best=80387.577) pos=-1.61 0.0 0.0
Ep= 102, score=2160.760 (best=80387.577) pos=-1.61 0.0 0.0
Ep= 103, score=2160.744 (best=80387.577) pos=-1.61 -0.0 0.0
Ep= 104, score=2160.745 (best=80387.577) pos=-1.61 0.0 0.0
Ep= 105, score=2160.745 (best=80387.577) pos=-1.61 0.0 0.0
Ep= 106, score=2160.778 (best=80387.577) pos=-1.61 -0.0 0.0
Ep= 107, score=2160.745 (best=80387.577) pos=-1.61 0.0 0.0
Ep= 108, score=2160.778 (best=80387.577) pos=-1.61 0.0 0.0
Ep= 109, score=2160.755 (best=80387.577) pos=-1.61 0.0 0.0
Ep= 110, score=2160.781 (best=80387.577) pos=-1.61 -0.0 0.0
Ep= 111, score=2160.765 (best=80387.577) pos=-1.61 0.0 0.0
Ep= 112, score=2160.785 (best=80387.577) pos=-1.61 0.0 0.0
Ep= 113, score=2160.796 (best=80387.577) pos=-1.6 -0.0 0.0
Ep= 114, score=2160.847 (best=80387.577) pos=-1.59 -0.0 0.0
Ep= 115, score=2641.551 (best=80387.577) pos=-1.83 0.0 0.0
Ep= 116, score=479.879 (best=80387.577) pos=0.05 0.0 0.0
Ep= 117, score=479.882 (best=80387.577) pos=-0.03 -0.0 0.0
Ep= 118, score=479.881 (best=80387.577) pos=0.02 0.0 0.0
Ep= 119, score=479.902 (best=80387.577) pos=-0.09 -0.0 0.0
Ep= 120, score=11077.812 (best=80387.577) pos=31.41 -0.0 0.0
Ep= 121, score=105038.736 (best=105038.736) pos=-0.36 3.67 157.67
Ep= 122, score=479.892 (best=105038.736) pos=0.02 0.0 0.0
Ep= 123, score=479.904 (best=105038.736) pos=0.06 -0.0 0.0
Ep= 124, score=479.923 (best=105038.736) pos=0.09 -0.0 0.0
Ep= 125, score=719.895 (best=105038.736) pos=0.17 -0.0 0.0
Ep= 126, score=959.918 (best=105038.736) pos=0.3 -0.0 0.0
Ep= 127, score=1440.059 (best=105038.736) pos=0.38 0.0 0.0
Ep= 128, score=1440.395 (best=105038.736) pos=0.33 0.0 0.0
Ep= 129, score=1680.550 (best=105038.736) pos=0.41 0.0 0.0
Ep= 130, score=1920.652 (best=105038.736) pos=0.4 0.0 0.0
Ep= 131, score=1920.831 (best=105038.736) pos=0.34 0.0 0.0
Ep= 132, score=2160.874 (best=105038.736) pos=0.3 -0.0 0.0
Ep= 133, score=2160.928 (best=105038.736) pos=0.27 0.0 0.0
Ep= 134, score=2160.994 (best=105038.736) pos=0.24 -0.0 0.0
Ep= 135, score=2161.070 (best=105038.736) pos=0.21 -0.0 0.0
Ep= 136, score=2161.067 (best=105038.736) pos=0.2 -0.0 0.0
Ep= 137, score=2161.086 (best=105038.736) pos=0.18 0.0 0.0
Ep= 138, score=2161.138 (best=105038.736) pos=0.15 0.0 0.0
Ep= 139, score=2161.172 (best=105038.736) pos=0.13 0.0 0.0
Ep= 140, score=2161.194 (best=105038.736) pos=0.11 0.0 0.0
Ep= 141, score=2161.236 (best=105038.736) pos=0.09 -0.0 0.0
Ep= 142, score=2161.243 (best=105038.736) pos=0.07 0.0 0.0
Ep= 143, score=2161.303 (best=105038.736) pos=0.04 -0.0 0.0
Ep= 144, score=2161.288 (best=105038.736) pos=0.02 0.0 0.0
Ep= 145, score=2161.296 (best=105038.736) pos=0.01 0.0 0.0
Ep= 146, score=2161.316 (best=105038.736) pos=0.0 -0.0 0.0
Ep= 147, score=2161.330 (best=105038.736) pos=-0.02 0.0 0.0
Ep= 148, score=2161.326 (best=105038.736) pos=-0.04 0.0 0.0
Ep= 149, score=2161.350 (best=105038.736) pos=-0.06 -0.0 0.0
Ep= 150, score=2161.374 (best=105038.736) pos=-0.08 -0.0 0.0
Ep= 151, score=2161.375 (best=105038.736) pos=-0.08 -0.0 0.0
Ep= 152, score=2161.343 (best=105038.736) pos=-0.08 0.0 0.0
Ep= 153, score=2161.355 (best=105038.736) pos=-0.1 0.0 0.0
Ep= 154, score=2161.381 (best=105038.736) pos=-0.11 -0.0 0.0
Ep= 155, score=2161.355 (best=105038.736) pos=-0.14 0.0 0.0
Ep= 156, score=2161.380 (best=105038.736) pos=-0.15 -0.0 0.0
Ep= 157, score=2161.377 (best=105038.736) pos=-0.17 -0.0 0.0
Ep= 158, score=2161.342 (best=105038.736) pos=-0.18 0.0 0.0
Ep= 159, score=2161.353 (best=105038.736) pos=-0.18 0.0 0.0
Ep= 160, score=2161.373 (best=105038.736) pos=-0.18 -0.0 0.0
Ep= 161, score=2161.372 (best=105038.736) pos=-0.19 -0.0 0.0
Ep= 162, score=2161.358 (best=105038.736) pos=-0.21 -0.0 0.0
Ep= 163, score=2161.353 (best=105038.736) pos=-0.23 -0.0 0.0
Ep= 164, score=2161.320 (best=105038.736) pos=-0.23 -0.0 0.0
Ep= 165, score=2161.350 (best=105038.736) pos=-0.24 -0.0 0.0
Ep= 166, score=2161.335 (best=105038.736) pos=-0.24 0.0 0.0
Ep= 167, score=2161.311 (best=105038.736) pos=-0.25 0.0 0.0
Ep= 168, score=2161.325 (best=105038.736) pos=-0.25 -0.0 0.0
Ep= 169, score=2161.336 (best=105038.736) pos=-0.26 -0.0 0.0
Ep= 170, score=2161.336 (best=105038.736) pos=-0.26 -0.0 0.0
Ep= 171, score=2161.305 (best=105038.736) pos=-0.26 0.0 0.0
Ep= 172, score=2161.310 (best=105038.736) pos=-0.25 0.0 0.0
Ep= 173, score=2161.311 (best=105038.736) pos=-0.25 0.0 0.0
Ep= 174, score=2161.349 (best=105038.736) pos=-0.24 -0.0 0.0
Ep= 175, score=2161.350 (best=105038.736) pos=-0.24 -0.0 0.0
Ep= 176, score=2161.325 (best=105038.736) pos=-0.24 0.0 0.0
Ep= 177, score=2161.315 (best=105038.736) pos=-0.24 0.0 0.0
Ep= 178, score=2161.314 (best=105038.736) pos=-0.24 0.0 0.0
Ep= 179, score=2161.314 (best=105038.736) pos=-0.24 0.0 0.0
Ep= 180, score=2161.333 (best=105038.736) pos=-0.24 0.0 0.0
Ep= 181, score=2161.338 (best=105038.736) pos=-0.25 0.0 0.0
Ep= 182, score=2161.303 (best=105038.736) pos=-0.26 0.0 0.0
Ep= 183, score=2161.329 (best=105038.736) pos=-0.27 -0.0 0.0
Ep= 184, score=2161.325 (best=105038.736) pos=-0.27 -0.0 0.0
Ep= 185, score=2161.292 (best=105038.736) pos=-0.27 0.0 0.0
Ep= 186, score=2161.298 (best=105038.736) pos=-0.26 -0.0 0.0
Ep= 187, score=2161.300 (best=105038.736) pos=-0.26 0.0 0.0
Ep= 188, score=2161.337 (best=105038.736) pos=-0.26 -0.0 0.0
Ep= 189, score=2161.305 (best=105038.736) pos=-0.25 0.0 0.0
Ep= 190, score=2161.305 (best=105038.736) pos=-0.26 0.0 0.0
Ep= 191, score=2161.329 (best=105038.736) pos=-0.25 0.0 0.0
Ep= 192, score=2161.338 (best=105038.736) pos=-0.26 -0.0 0.0
Ep= 193, score=2161.334 (best=105038.736) pos=-0.26 -0.0 0.0
Ep= 194, score=2161.300 (best=105038.736) pos=-0.26 0.0 0.0
Ep= 195, score=2161.332 (best=105038.736) pos=-0.26 -0.0 0.0
Ep= 196, score=2161.331 (best=105038.736) pos=-0.26 -0.0 0.0
Ep= 197, score=2161.302 (best=105038.736) pos=-0.26 0.0 0.0
Ep= 198, score=2161.333 (best=105038.736) pos=-0.26 -0.0 0.0
Ep= 199, score=2161.322 (best=105038.736) pos=-0.25 -0.0 0.0
Ep= 200, score=2161.310 (best=105038.736) pos=-0.25 0.0 0.0
Ep= 201, score=2161.346 (best=105038.736) pos=-0.24 -0.0 0.0
Ep= 202, score=2161.318 (best=105038.736) pos=-0.23 0.0 0.0
Ep= 203, score=2161.342 (best=105038.736) pos=-0.22 -0.0 0.0
Ep= 204, score=2161.350 (best=105038.736) pos=-0.2 -0.0 0.0
Ep= 205, score=2161.374 (best=105038.736) pos=-0.18 -0.0 0.0
Ep= 206, score=2161.353 (best=105038.736) pos=-0.15 0.0 0.0
Ep= 207, score=2161.370 (best=105038.736) pos=-0.11 -0.0 0.0
Ep= 208, score=2161.370 (best=105038.736) pos=-0.08 -0.0 0.0
Ep= 209, score=2161.353 (best=105038.736) pos=-0.05 -0.0 0.0
Ep= 210, score=2161.319 (best=105038.736) pos=-0.01 -0.0 0.0
Ep= 211, score=2161.312 (best=105038.736) pos=0.03 -0.0 0.0
Ep= 212, score=2161.251 (best=105038.736) pos=0.06 0.0 0.0
Ep= 213, score=2161.255 (best=105038.736) pos=0.08 -0.0 0.0
Ep= 214, score=2161.175 (best=105038.736) pos=0.14 -0.0 0.0
Ep= 215, score=2161.097 (best=105038.736) pos=0.19 -0.0 0.0
Ep= 216, score=2161.012 (best=105038.736) pos=0.23 0.0 0.0
Ep= 217, score=2160.923 (best=105038.736) pos=0.27 0.0 0.0
Ep= 218, score=2160.828 (best=105038.736) pos=0.31 -0.0 0.0
Ep= 219, score=1920.682 (best=105038.736) pos=0.4 0.0 0.0
Ep= 220, score=1680.534 (best=105038.736) pos=0.42 0.0 0.0
Ep= 221, score=1680.479 (best=105038.736) pos=0.44 -0.0 0.0
Ep= 222, score=1440.387 (best=105038.736) pos=0.33 -0.0 0.0
Ep= 223, score=1440.279 (best=105038.736) pos=0.35 -0.0 0.0
Ep= 224, score=1440.166 (best=105038.736) pos=0.37 -0.0 0.0
Ep= 225, score=1440.130 (best=105038.736) pos=0.37 0.0 0.0
Ep= 226, score=1440.083 (best=105038.736) pos=0.38 -0.0 0.0
Ep= 227, score=1199.994 (best=105038.736) pos=0.33 0.0 0.0
Ep= 228, score=959.934 (best=105038.736) pos=0.3 -0.0 0.0
Ep= 229, score=959.912 (best=105038.736) pos=0.3 -0.0 0.0
Ep= 230, score=959.871 (best=105038.736) pos=0.29 0.0 0.0
Ep= 231, score=719.896 (best=105038.736) pos=0.17 0.0 0.0
Ep= 232, score=719.892 (best=105038.736) pos=0.17 -0.0 0.0
Ep= 233, score=719.878 (best=105038.736) pos=0.16 -0.0 0.0
Ep= 234, score=719.873 (best=105038.736) pos=0.15 -0.0 0.0
Ep= 235, score=719.861 (best=105038.736) pos=0.15 0.0 0.0
Ep= 236, score=719.860 (best=105038.736) pos=0.15 0.0 0.0
Ep= 237, score=719.869 (best=105038.736) pos=0.15 0.0 0.0
Ep= 238, score=719.874 (best=105038.736) pos=0.15 -0.0 0.0
Ep= 239, score=719.861 (best=105038.736) pos=0.15 0.0 0.0
Ep= 240, score=719.857 (best=105038.736) pos=0.14 0.0 0.0
Ep= 241, score=479.919 (best=105038.736) pos=0.09 -0.0 0.0
Ep= 242, score=479.920 (best=105038.736) pos=0.08 -0.0 0.0
Ep= 243, score=479.910 (best=105038.736) pos=0.08 0.0 0.0
Ep= 244, score=479.912 (best=105038.736) pos=0.07 -0.0 0.0
Ep= 245, score=479.906 (best=105038.736) pos=0.06 -0.0 0.0
Ep= 246, score=479.896 (best=105038.736) pos=0.05 0.0 0.0
Ep= 247, score=479.895 (best=105038.736) pos=0.04 0.0 0.0
Ep= 248, score=479.896 (best=105038.736) pos=0.03 -0.0 0.0
Ep= 249, score=479.893 (best=105038.736) pos=0.03 0.0 0.0
Ep= 250, score=479.891 (best=105038.736) pos=0.02 0.0 0.0
Ep= 251, score=479.889 (best=105038.736) pos=0.02 0.0 0.0
Ep= 252, score=479.889 (best=105038.736) pos=0.02 0.0 0.0
Ep= 253, score=479.889 (best=105038.736) pos=0.02 0.0 0.0
Ep= 254, score=479.886 (best=105038.736) pos=0.03 0.0 0.01
Ep= 255, score=479.888 (best=105038.736) pos=0.03 0.0 0.01
Ep= 256, score=479.885 (best=105038.736) pos=0.04 0.0 0.0
Ep= 257, score=479.876 (best=105038.736) pos=0.05 -0.0 0.0
Ep= 258, score=479.883 (best=105038.736) pos=0.05 0.0 0.0
Ep= 259, score=479.883 (best=105038.736) pos=0.05 -0.0 0.0
Ep= 260, score=479.883 (best=105038.736) pos=0.04 -0.0 0.0
Ep= 261, score=479.877 (best=105038.736) pos=0.02 0.0 0.0
Ep= 262, score=479.883 (best=105038.736) pos=0.01 -0.0 0.0
Ep= 263, score=479.884 (best=105038.736) pos=0.0 -0.0 0.0
Ep= 264, score=479.883 (best=105038.736) pos=-0.01 -0.0 0.0
Ep= 265, score=479.883 (best=105038.736) pos=-0.01 -0.0 0.0
Ep= 266, score=479.876 (best=105038.736) pos=-0.01 0.0 0.0
Ep= 267, score=479.883 (best=105038.736) pos=-0.01 -0.0 0.0
Ep= 268, score=479.876 (best=105038.736) pos=-0.02 0.0 0.0
Ep= 269, score=479.882 (best=105038.736) pos=-0.02 -0.0 0.0
Ep= 270, score=479.879 (best=105038.736) pos=-0.04 -0.0 0.0
Ep= 271, score=479.876 (best=105038.736) pos=-0.05 0.0 0.0
Ep= 272, score=479.882 (best=105038.736) pos=-0.05 0.0 0.0
Ep= 273, score=479.880 (best=105038.736) pos=-0.04 -0.0 0.01
Ep= 274, score=479.887 (best=105038.736) pos=-0.02 -0.0 0.02
Ep= 275, score=479.881 (best=105038.736) pos=0.0 -0.0 0.01
Ep= 276, score=479.877 (best=105038.736) pos=0.02 -0.0 0.0
Ep= 277, score=479.881 (best=105038.736) pos=-0.0 0.0 0.0
Ep= 278, score=479.887 (best=105038.736) pos=-0.06 0.0 0.0
Ep= 279, score=479.899 (best=105038.736) pos=-0.11 0.0 0.0
Ep= 280, score=1680.282 (best=105038.736) pos=-1.45 -0.0 0.0
Ep= 281, score=6772.175 (best=105038.736) pos=-19.22 -0.0 0.0
Ep= 282, score=103038.812 (best=105038.736) pos=-9.55 -1.8 158.34
Ep= 283, score=133261.978 (best=133261.978) pos=-9.48 0.65 52.46
Ep= 284, score=479.882 (best=133261.978) pos=0.04 0.0 0.0
Ep= 285, score=479.876 (best=133261.978) pos=0.04 -0.0 0.0
Ep= 286, score=479.876 (best=133261.978) pos=0.04 -0.0 0.0
Ep= 287, score=479.876 (best=133261.978) pos=0.04 -0.0 0.0
Ep= 288, score=479.876 (best=133261.978) pos=0.04 -0.0 0.0
Ep= 289, score=479.876 (best=133261.978) pos=0.04 0.0 0.0
Ep= 290, score=479.876 (best=133261.978) pos=0.03 0.0 0.0
Ep= 291, score=479.876 (best=133261.978) pos=0.03 0.0 0.0
Ep= 292, score=479.883 (best=133261.978) pos=0.02 -0.0 0.0
Ep= 293, score=479.876 (best=133261.978) pos=0.02 0.0 0.0
Ep= 294, score=479.878 (best=133261.978) pos=0.02 0.0 0.0
Ep= 295, score=479.876 (best=133261.978) pos=0.02 0.0 0.0
Ep= 296, score=479.876 (best=133261.978) pos=0.02 0.0 0.0
Ep= 297, score=479.876 (best=133261.978) pos=0.02 0.0 0.0
Ep= 298, score=479.883 (best=133261.978) pos=0.01 -0.0 0.0
Ep= 299, score=479.876 (best=133261.978) pos=0.01 0.0 0.0
Ep= 300, score=479.883 (best=133261.978) pos=0.01 -0.0 0.0
Ep= 301, score=479.883 (best=133261.978) pos=0.0 -0.0 0.0
Ep= 302, score=479.876 (best=133261.978) pos=0.0 0.0 0.0
Ep= 303, score=479.878 (best=133261.978) pos=-0.0 0.0 0.0
Ep= 304, score=479.883 (best=133261.978) pos=-0.01 -0.0 0.0
Ep= 305, score=479.883 (best=133261.978) pos=-0.01 -0.0 0.0
Ep= 306, score=479.883 (best=133261.978) pos=-0.01 -0.0 0.0
Ep= 307, score=479.878 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 308, score=479.882 (best=133261.978) pos=-0.01 -0.0 0.0
Ep= 309, score=479.876 (best=133261.978) pos=-0.02 0.0 0.0
Ep= 310, score=479.876 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 311, score=479.882 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 312, score=479.876 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 313, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 314, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 315, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 316, score=479.883 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 317, score=479.883 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 318, score=479.883 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 319, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 320, score=479.883 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 321, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 322, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 323, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 324, score=479.882 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 325, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 326, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 327, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 328, score=479.877 (best=133261.978) pos=-0.02 -0.0 0.0
Ep= 329, score=239.941 (best=133261.978) pos=-0.01 -0.0 0.0
Ep= 330, score=239.943 (best=133261.978) pos=-0.01 -0.0 0.0
Ep= 331, score=239.943 (best=133261.978) pos=-0.01 -0.0 0.0
Ep= 332, score=239.943 (best=133261.978) pos=-0.01 -0.0 0.0
Ep= 333, score=239.943 (best=133261.978) pos=-0.01 -0.0 0.0
Ep= 334, score=239.941 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 335, score=239.941 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 336, score=239.941 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 337, score=239.943 (best=133261.978) pos=-0.01 -0.0 0.0
Ep= 338, score=239.941 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 339, score=479.883 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 340, score=479.875 (best=133261.978) pos=-0.0 -0.0 0.0
Ep= 341, score=479.875 (best=133261.978) pos=-0.0 -0.0 0.0
Ep= 342, score=479.874 (best=133261.978) pos=-0.0 -0.0 0.0
Ep= 343, score=479.874 (best=133261.978) pos=0.0 -0.0 0.0
Ep= 344, score=479.880 (best=133261.978) pos=0.0 0.0 0.0
Ep= 345, score=479.874 (best=133261.978) pos=0.0 -0.0 0.0
Ep= 346, score=479.874 (best=133261.978) pos=0.0 -0.0 0.0
Ep= 347, score=479.874 (best=133261.978) pos=0.0 -0.0 0.0
Ep= 348, score=479.880 (best=133261.978) pos=0.0 0.0 0.0
Ep= 349, score=479.880 (best=133261.978) pos=0.0 0.0 0.0
Ep= 350, score=479.874 (best=133261.978) pos=0.01 -0.0 0.0
Ep= 351, score=479.874 (best=133261.978) pos=0.01 -0.0 0.0
Ep= 352, score=479.874 (best=133261.978) pos=0.01 -0.0 0.0
Ep= 353, score=479.881 (best=133261.978) pos=0.01 0.0 0.0
Ep= 354, score=479.880 (best=133261.978) pos=0.01 0.0 0.0
Ep= 355, score=479.874 (best=133261.978) pos=0.01 -0.0 0.0
Ep= 356, score=479.876 (best=133261.978) pos=0.0 -0.0 0.0
Ep= 357, score=479.874 (best=133261.978) pos=0.0 -0.0 0.0
Ep= 358, score=479.874 (best=133261.978) pos=0.0 -0.0 0.0
Ep= 359, score=479.874 (best=133261.978) pos=0.0 -0.0 0.0
Ep= 360, score=479.880 (best=133261.978) pos=-0.0 0.0 0.0
Ep= 361, score=479.877 (best=133261.978) pos=-0.02 -0.0 0.0
Ep= 362, score=479.882 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 363, score=479.882 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 364, score=479.882 (best=133261.978) pos=-0.02 -0.0 0.0
Ep= 365, score=479.877 (best=133261.978) pos=-0.0 0.0 0.0
Ep= 366, score=479.876 (best=133261.978) pos=0.01 0.0 0.0
Ep= 367, score=479.883 (best=133261.978) pos=0.02 -0.0 0.0
Ep= 368, score=479.876 (best=133261.978) pos=0.02 0.0 0.0
Ep= 369, score=479.878 (best=133261.978) pos=0.03 0.0 0.0
Ep= 370, score=479.883 (best=133261.978) pos=0.03 -0.0 0.0
Ep= 371, score=479.876 (best=133261.978) pos=0.03 0.0 0.0
Ep= 372, score=479.876 (best=133261.978) pos=0.04 0.0 0.0
Ep= 373, score=479.876 (best=133261.978) pos=0.04 0.0 0.0
Ep= 374, score=479.882 (best=133261.978) pos=0.04 -0.0 0.0
Ep= 375, score=479.876 (best=133261.978) pos=0.04 0.0 0.0
Ep= 376, score=479.876 (best=133261.978) pos=0.04 -0.0 0.0
Ep= 377, score=479.876 (best=133261.978) pos=0.04 -0.0 0.0
Ep= 378, score=479.876 (best=133261.978) pos=0.04 0.0 0.0
Ep= 379, score=479.876 (best=133261.978) pos=0.04 0.0 0.0
Ep= 380, score=479.876 (best=133261.978) pos=0.03 0.0 0.0
Ep= 381, score=479.876 (best=133261.978) pos=0.03 0.0 0.0
Ep= 382, score=479.876 (best=133261.978) pos=0.03 0.0 0.0
Ep= 383, score=479.876 (best=133261.978) pos=0.03 0.0 0.0
Ep= 384, score=479.876 (best=133261.978) pos=0.03 0.0 0.0
Ep= 385, score=479.876 (best=133261.978) pos=0.03 0.0 0.0
Ep= 386, score=479.876 (best=133261.978) pos=0.02 0.0 0.0
Ep= 387, score=479.876 (best=133261.978) pos=0.02 0.0 0.0
Ep= 388, score=479.876 (best=133261.978) pos=0.01 0.0 0.0
Ep= 389, score=479.876 (best=133261.978) pos=0.01 0.0 0.0
Ep= 390, score=479.876 (best=133261.978) pos=0.01 0.0 0.0
Ep= 391, score=479.876 (best=133261.978) pos=0.0 0.0 0.0
Ep= 392, score=479.877 (best=133261.978) pos=-0.0 0.0 0.0
Ep= 393, score=479.877 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 394, score=479.876 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 395, score=479.876 (best=133261.978) pos=-0.02 0.0 0.0
Ep= 396, score=479.876 (best=133261.978) pos=-0.02 0.0 0.0
Ep= 397, score=479.876 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 398, score=479.876 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 399, score=479.876 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 400, score=479.876 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 401, score=479.876 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 402, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 403, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 404, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 405, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 406, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 407, score=479.876 (best=133261.978) pos=-0.03 -0.0 0.0
Ep= 408, score=479.876 (best=133261.978) pos=-0.02 -0.0 0.0
Ep= 409, score=239.940 (best=133261.978) pos=-0.01 -0.0 0.0
Ep= 410, score=239.940 (best=133261.978) pos=-0.01 -0.0 0.0
Ep= 411, score=239.941 (best=133261.978) pos=-0.01 -0.0 0.0
Ep= 412, score=239.941 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 413, score=239.941 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 414, score=239.941 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 415, score=239.941 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 416, score=479.875 (best=133261.978) pos=-0.0 -0.0 0.0
Ep= 417, score=479.874 (best=133261.978) pos=0.0 -0.0 0.0
Ep= 418, score=479.874 (best=133261.978) pos=0.0 -0.0 0.0
Ep= 419, score=479.875 (best=133261.978) pos=0.01 -0.0 0.0
Ep= 420, score=479.876 (best=133261.978) pos=0.01 -0.0 0.0
Ep= 421, score=479.877 (best=133261.978) pos=0.0 0.0 0.0
Ep= 422, score=479.878 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 423, score=479.879 (best=133261.978) pos=-0.01 0.0 0.0
Ep= 424, score=479.880 (best=133261.978) pos=-0.02 0.0 0.0
Ep= 425, score=479.881 (best=133261.978) pos=-0.03 0.0 0.0
Ep= 426, score=479.882 (best=133261.978) pos=-0.05 0.0 0.0
Ep= 427, score=479.884 (best=133261.978) pos=-0.06 0.0 0.0
Ep= 428, score=479.888 (best=133261.978) pos=-0.08 0.0 0.0
Ep= 429, score=479.891 (best=133261.978) pos=-0.09 0.0 0.0
Ep= 430, score=479.894 (best=133261.978) pos=-0.1 0.0 0.0
Ep= 431, score=479.900 (best=133261.978) pos=-0.11 0.0 0.0
Ep= 432, score=719.845 (best=133261.978) pos=-0.28 0.0 0.0
Ep= 433, score=719.871 (best=133261.978) pos=-0.29 0.0 0.0
Ep= 434, score=14535.354 (best=133261.978) pos=-25.54 -0.42 0.0
Ep= 435, score=719.834 (best=133261.978) pos=-0.06 0.03 0.0
Ep= 436, score=719.831 (best=133261.978) pos=-0.05 0.04 0.0
Ep= 437, score=719.830 (best=133261.978) pos=-0.05 0.04 0.0
Ep= 438, score=719.829 (best=133261.978) pos=-0.04 0.04 0.0
Ep= 439, score=719.828 (best=133261.978) pos=-0.03 0.05 0.0
Ep= 440, score=719.827 (best=133261.978) pos=-0.03 0.05 0.0
Ep= 441, score=719.827 (best=133261.978) pos=-0.02 0.05 0.0
Ep= 442, score=719.826 (best=133261.978) pos=-0.02 0.05 0.0
Ep= 443, score=719.825 (best=133261.978) pos=-0.01 0.05 0.0
Ep= 444, score=719.823 (best=133261.978) pos=-0.01 0.05 0.0
Ep= 445, score=719.821 (best=133261.978) pos=-0.0 0.04 0.0
Ep= 446, score=719.819 (best=133261.978) pos=-0.0 0.04 0.0
Ep= 447, score=719.817 (best=133261.978) pos=-0.0 0.04 0.0
Ep= 448, score=479.880 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 449, score=479.880 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 450, score=479.880 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 451, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 452, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 453, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 454, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 455, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 456, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 457, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 458, score=479.877 (best=133261.978) pos=0.0 0.03 0.0
Ep= 459, score=479.877 (best=133261.978) pos=0.0 0.03 0.0
Ep= 460, score=479.877 (best=133261.978) pos=0.0 0.03 0.0
Ep= 461, score=479.877 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 462, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 463, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 464, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 465, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 466, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 467, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 468, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 469, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 470, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 471, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 472, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 473, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 474, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 475, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 476, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 477, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 478, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 479, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 480, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 481, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 482, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 483, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 484, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 485, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 486, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 487, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 488, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 489, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 490, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 491, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 492, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 493, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 494, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 495, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 496, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 497, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 498, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 499, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0
Ep= 500, score=479.878 (best=133261.978) pos=-0.0 0.03 0.0

Plot the Rewards

Once you are satisfied with your performance, plot the episode rewards, either from a single run, or averaged over multiple runs.

In [18]:
plt.plot(rewards, label='rewards')
Out[18]:
[<matplotlib.lines.Line2D at 0x7f47986d8ef0>]

Reflections

Question 1: Describe the task that you specified in task.py. How did you design the reward function?

Answer:The task I designed represents the "take-off", that should be straight vertical until a certain height defined in the target position. In order to make the agent learn this behaviour several penalty terms where added to the reward function:

  • sum of euler angles, in order to keep the flight vertical
  • distance from target
  • difference between velocity and distance from target A constant reward is added when the distance to the target is below a certain threshold. A constant reward is given for each step of flight.

The task state was modified in order to include also the current velocity and angular velocity.

Question 2: Discuss your agent briefly, using the following questions as a guide:

  • What learning algorithm(s) did you try? What worked best for you?
  • What was your final choice of hyperparameters (such as $\alpha$, $\gamma$, $\epsilon$, etc.)?
  • What neural network architecture did you use (if any)? Specify layers, sizes, activation functions, etc.

Answer:The algorithm used is based on the suggested DDPG alogrithm. I tried different neural network sizes, but after observing that more complex network did not add much value I decided not to change the network architecture.

For the actor I used 3 dense hidden layers of size 32 - 64 - 32 with relu activation, a final layer with sigmoid activation, which is scaled to the needed range using a lambda layer.

For the critic 2 paths are used, one for the state and one for the action, both using 2 dense hidden layers with 32 and 64 nodes and relu activation function. The paths are then added and passed through another relu activation layer. Finally a one node layer is added to produce the Q-values.

Both nets use Adam optimizer.

After trying different neural network layer sizes, which did not improve results, I tried different parameter combinations for the exploration theta and sigma, and the discount factor gamma and the soft update of target parameter tau. The best results where obtained using theta=0.15, sigma=0.001, gamma=0.99 and tau=0.1.

Question 3: Using the episode rewards plot, discuss how the agent learned over time.

  • Was it an easy task to learn or hard?
  • Was there a gradual learning curve, or an aha moment?
  • How good was the final performance of the agent? (e.g. mean rewards over the last 10 episodes)

Answer:I think the task of flying a quadcopter is relatively hard, because it needs a good coordination of 4 different actions. In my task, the vertical take off, this difficulty is slightly reduced, because once the agent learns to give the 4 rotors the same power it works the quadcopter remains always near to the vertical axis. The agent behaves very good in don't moving horizontaly, but it does always move too high (160 instead of 100) and it doesn't learn to improve it after getting hundrets of times at the exact same position. I improved the hight error, from 250 down to 160, adding a penalty term that links velocity with distance from the target, but it needs to be further improved.

Question 4: Briefly summarize your experience working on this project. You can use the following prompts for ideas.

  • What was the hardest part of the project? (e.g. getting started, plotting, specifying the task, etc.)
  • Did you find anything interesting in how the quadcopter or your agent behaved?

Answer:The hardest part of the project was for sure finding the correct reward function. I tried to add many different penalty terms for euler angles, velocity, angular velocity, ... but then it became difficult to balance it with the positive reward. As a result the quadcopter didn't take off at all. So I decided to keep the reward function as simple as possible and to balance positive rewards and penalties using different scaling constants.

Looking at the plots below you can notice that the learned agent has only horizontal velocity and that the velocity increases only to a certain point after which it remains constant. The problem here is that in reality it should decrease horizontal speed once it reached the target.

Another interesting fact is that, looking at the agents action plot, you can see that only rotor 1 and 2 are used, while rotor 3 and 4 are around 0 revolutions per second.