Transfer Learning in Deep Q-Networks

Training on Cartesius

For this project I run all programs on the Dutch national supercomputer called Cartesius. You need a SURFsara account to be able to do this.

I logged into my account on Cartesius and cloned a Theano-based implementation of Deep Q-Learning:

ssh username@doornode.surfsara.nl
git clone https://github.com/spragunr/deep_q_rl
cd deep_q_rl

Now we are inside the deep_q_rl/ directory which has all the files that we need. Next, we want to create a job.sh file that contains the instructions of what we want to run:

vim job.sh

Paste this in the file:

#!/bin/bash
#SBATCH -p gpu
#SBATCH -N 1
#SBATCH -t 40:00:00
module load cuda
module load cudnn
module load torch7
module load opencv/gnu/2.4.10
module load python/2.7.11
THEANO_FLAGS='device=gpu,floatX=float32' srun -u python deep_q_rl/run_nips.py --rom pong.bin

Save and quit the file. Then make this file executable by running:

chmod +x job.sh

Next we need to modify the original dep_script.sh to something we can run. We copy the code involving the install of pylearn2 and ALE and paste this a new file called install_dep.sh. Now install_dep.sh looks like this:

#!/bin/bash
git clone git://github.com/lisa-lab/pylearn2.git
cd ./pylearn2
python setup.py develop --user
cd ..
git clone https://github.com/mgbellemare/Arcade-Learning-Environment.git ALE
cd ./ALE
mkdir build && cd build
cmake -DUSE_SDL=ON -DUSE_RLGLUE=OFF -DBUILD_EXAMPLES=ON ..
make -j 4
pip install --user
cd ..

Then we make this file executable by running the same chmod command:

chmod +x install_dep.sh

Now we’re ready to install pylearn2 and ALE. Let’s do it:

module load cuda cudnn torch7 opencv/gnu/2.4.10 python/2.7.11
./install_dep.sh

Next, navigate to the ALE/ directory that was just created and copy the file ale.cfg and paste it in the main deep_q_rl/ directory:

cd ALE/
cp ale.cfg ../

We’re almost there. Now navigate to the roms/ directory. There is a link for ROMs that you need to download. So for example, in our case we want to train the DQN to play Pong, so we need the Pong ROM file. However, the code we originally cloned from Github doesn’t know that the game was originally called Video Olympics and neither did I. It took some time to figure this out, so be sure of what the name was called originally and also check how the code thinks it is called by checking the supported games in:

cd ~/deep_q_rl/ALE/src/games/supported/

In here you can list all the cpp and hpp files that the code needs to be able to run an Atari 2600 ROM. If the cpp/hpp file is not there, you either need to accept that the game isn’t supported or try to build your own cpp/hpp files. I don’t have any experience with this. So we can see that for example Pong.hpp exists. If we open this file, with vim for example, we can see that it contains the line:

// the rom-name
const char* rom() const { return "pong"; }

This is the name you need to feed in the following 2 files to be able to run the code. Do:

vim ~/deep_q_rl/job.sh

and edit the line so it says:

THEANO_FLAGS='device=gpu,floatX=float32' srun -u python deep_q_rl/run_nips.py --rom pong.bin

Then we need to set the BASE_ROM_PATH so the program knows where your ROM files are located and we need to change the name of the ROM to “pong”:

vim ~/deep_q_rl/deep_q_rl/run_nips.py

and edit the line so it says:

BASE_ROM_PATH = "/home/username/deep_q_rl/roms/"
ROM = 'pong.bin'

Now we are ready:

cd ~/deep_q_rl/
sbatch job.sh
squeue -u <username>

If everything is running smooth, the output should look something like this:

  JOBID PARTITION     NAME       USER ST        TIME  NODES NODELIST(REASON)
2099600       gpu   job.sh   username  R    00:00:05      1 gcn18

After the job has finished running, you have a trained model!! Hooray!! If we check out the contents of deep_q_rl/ we can see that there is a new folder called pong_xx-xx-xx-xx_xxxxxx_xxxx, the x’s being numbers that get automatically generated. This folder contains everything about our trained model.

I will rename this folder pong-nips:

mv ~/deep_q_rl/pong__xx-xx-xx-xx_xxxxxx_xxxx ~/deep_q_rl/pong-nips

Playing on Cartesius

After about a day you should have a trained network for your game. (For the nature paper it takes about 4 to 5 days on Cartesius) Let’s watch it play! Log in to your SURFsara account through the terminal like this:

ssh -CY username@cartesius.surfsara.nl

Note that you have to be connecting from a whitelisted IP-address. You can send an e-mail to their helpdesk to get your IP whitelisted.

Make a .theanorc in your Cartesius home directory containing the following:

[global]
floatX = float32
device = gpu
allow_gc = False

We also need to make a sleepjob.sh file in order to reserve a GPU node to later execute python code on:

#!/bin/bash
#SBATCH -p gpu
#SBATCH -N 1
#SBATCH -t 1:00:00
sleep 1200

All this does is sleep and reserving a node for us to work on. Next, run it:

sbatch sleepjob.sh

Now go to:

cd ~/deep_q_rl/deep_q_rl/
vim ale_run_watch.py

Because we trained using the run_nips.py file we also need to run it with that file so make the change:

:s/run_nature/run_nips/
:wq

Now run squeue to get the node where sleepjob.sh is running on:

squeue -u username

This returns:

  JOBID PARTITION       NAME       USER ST        TIME  NODES NODELIST(REASON)
2099600       gpu   sleepjob   username  R    00:00:05      1 gcn18

And we need gcn18. Now ssh with -CY parameters into this node:

ssh -CY gcn18

Now load the modules necessary:

module load cuda cudnn torch7 opencv/gnu/2.4.10 python/2.7.11

And run the file:

cd ~/deep_q_rl/deep_q_rl/
python ale_run_watch.py ../pong-nips/network_file_99.pkl

And voila, you can see your trained network playing Pong!!!

To see a graph with training stuff run:

python plot_results.py ../pong-nips/results.csv

Which gives:

To visualize the filters in the first layers of the trained network:

python plot_filters.py python ../pong-nips/network_file_99.pkl

which gives:

Doing transfer learning

Let’s do some transfer learning!!!

Method 1: Train a trained network

Let’s see what happens if we take our trained Pong network and just keep training it, but on a different game. We will first train it to play Breakout.

cp ~/deep_q_rl/pong-nips/network_file_99.pkl deep_q_rl/trained_pong.pkl
vim job.sh

Edit the line in job.sh such that it reads:

THEANO_FLAGS='device=gpu,floatX=float32' srun -u python deep_q_rl/run_nips.py --rom breakout.bin --nn-file deep_q_rl/trained_pong.pkl

And also edit run_nips so that it has the correct ROM ROM = 'breakout.bin'. You should check if ~/deep_q_rl/roms/ contains the breakout.bin ROM. If not, download it like we did with the Pong ROM.

Now do the following:

cd ~/deep_q_rl/
sbatch job.sh

Unfortunately this doesn’t work as there is some incompatibility between the models with regards to the action space. So slight change of plans. Train a network to play breakout and then do the same steps. So your job.sh file reads

THEANO_FLAGS='device=gpu,floatX=float32' srun -u python deep_q_rl/run_nips.py --rom pong.bin --nn-file deep_q_rl/trained_breakout.pkl

and then sbatch job.sh.

I also trained a Ms. PacMan model and tried to get it to play Pong and Breakout, but it’s also incompatible. So far a table of what is currently training:

Method 2: Only copy the 2 convolutional layers from the trained network

No idea how to do this. Work in progress.