Project 5: Second Learning Project

Executive Summary

This is your second learning project for the semester. This project is estimated to be 10-15 hours of coding. As with the previous project, this project will also require you to think about the task at a high level before coding. Note, this will also take several hours to collect the data you need to demonstrate learning. As with all of the projects, graduate students have additional requirements.

By the end of this project, you will have accomplished the following objectives.

Designed a ML agent using either genetic algorithms or reinforcement learning for use in a simulated real-world environment
Implemented RL or GAs to learn in real-time interacting with your environment
Demonstrated that RL or GAs can improve the actions of your agent

Overview video

Project 5 tasks and ideas

As with project 4, this project is somewhat open-ended but with constraints. Your task is to implement either RL or a genetic algorithms method to improve the behavior of your agent. Some ideas are listed below. You are welcome to use any of these ideas but you must propose your plan in part 1 (just as with project 4). We want to help make sure your ideas are feasible!

Some ideas to get you thinking:

Use genetic algorithms to decide the best breakpoints for each decision in a heuristic agent. For example, instead of always going to beacons if energy is less than a constant, learn what the different constants should be for the different choices. This would probably be a better choice for the cooperative environment rather than the CTF competitive environment.
Use GAs or RL to create effective behavior (this would work in either environment, just needs to handle multiple agents in CTF). For example, you create a set of high-level behaviors such as “go to flag” “go to nearest asteroid” “go to home base” etc and then you learn when to choose each action to maximize your score.
Use RL to train a policy for a specific action such as shooting an opponent (best in CTF/competitive) or for navigating without hitting obstacles (either environment)
Use RL or GAs to train a moveToObject action that performs efficiently (and does not suffer from the extreme slow-downs that PD controllers do).

A few notes to help make this doable:

RL is easiest if you have a discrete state and action space. To make this much easier, you should propose an approach to discretizing the state and action space in your proposal. This approach can be very simple (gridding the variables you need for the task you chose) or it could rely on other methods such as clustering.
GAs can handle a continuous state space, depending on your problem definition. If you are choosing to the idea of GAs to learn the best behavior cutoffs, that is definitely best done with a continuous state space! If you are choosing to do the state -> action idea (learning a behavior policy), that is best done with a discrete state space but I would also say much better with RL than GAs. Each has their advantages!

CS 5013 students need to implement two learning methods but one of these can be from the traditional methods in project 4 (e.g. if you can use RL or GAs AND make use of your previous learning, that works, or you could use clustering to create a state space, or some other way to combine two). You could also implement two RL or GA tasks.

Implementation details

There is an example GA client (that does NOT implement learning, just shows you one possible way to implement a GA (a policy-based approach). Look at spacesettlers.clients.examples to see this. You can use any of this code but you can also make your own GA (and remember, this one does not implement ANY learning, just shows how one could implement).

Class-wide ladders – Extra credit

The extra credit ladders remain the same as the previous projects. You are welcome to choose a different ladder path than you chose for either of the previous projects. The class-wide ladders will start on Nov 20, 2023.

The extra credit opportunities for being creative and finding bugs remain the same as in Project 1. Remember you have to document it in your writeup to get the extra credit!

Part 1: Due Nov 17 11:59 PM

Turn in a ONE page project proposal on canvas here. This proposal must:

Fully specify your proposed method(s) (still limited to one page even if you have two methods). This includes a description of your state space and actions if you are doing RL or how you will implement your chromosome for GAs.
It should say how you will collect your learning data (e.g. playing games in the ladder, etc)
It should specify how the method will be implemented back into your agent to improve the performance

Part 2: Due Dec 1 11:59 PM

Update your code from the last project.
Write your learning code as described above
Build and test your code using the ant compilation system
Submit your project on spacesettlers.cs.nor.ou.edu using the submit script as described below. You can submit as many times as you want and we will only grade the last submission.
- Submit ONLY the writeup to the correct Project 5 on canvas:
  - Project 5 for CS 4013
  - Project 5 for CS 5013
- Submit your file using one of the following commands (be sure your java files come last). You can submit to only ONE ladder. If you submit to both, small green monsters will track you down and deal with you appropriately.

/home/spacewar/bin/submit --config_file spacesettlersinit.xml \
--project project5_coop \
--java_files *.java

/home/spacewar/bin/submit --config_file spacesettlersinit.xml \
--project project5_compete \
--java_files *.java

- After the project deadline, the above command will not accept submissions. If you want to turn in your project late, use:

/home/spacewar/bin/submit --config_file spacesettlersinit.xml \
--project project5_coop_late \
--java_files *.java

/home/spacewar/bin/submit --config_file spacesettlersinit.xml \
--project project5_compete_late \
--java_files *.java

Rubric – Part 1 Due Nov 17 11:59pm

10 points for turning in a writeup as specified above

Rubric – Part 2 Due Dec 1 11:59pm

Reinforcement learning or genetic algorithms
- 30 points for correctly implementing the RL or GA method that you proposed and got feedback on (if you were told to choose a different method, you need to implement the method you were told to adjust to). A correct learner uses learning in a way to improve performance and learning will be demonstrated in the writeup (though the curve is graded separately) using a learning curve. Learning code should be well documented to receive full credit.
- 25 points if there is only one minor mistake.
- 20 points if there are several minor mistakes or if documentation is missing.
- 15 points if you have a major mistake
RL: State space or GA: Chromosome
- 10 points for a state space representation that is appropriate to the task being solved and is correctly implemented. If you implemented GAs, 10 points for the representation of the chromosome that is appropriate to the task being solved and is correctly implemented.
- 5 points for bugs
Reward or fitness function
- 10 points for a reward (RL) or fitness (GA) function appropriate to the task being solved and is correctly implemented
- 5 points for bugs
Graphics
- 10 points for correctly drawing graphics (or using printouts) that enable you to debug your learning and that help us to grade it.
- 7 points for drawing something useful for debugging and grading but with bugs in it
- 3 points for major graphical/printing bugs
CS 5013 students only: You must EITHER implement a second RL/GA task or integrate something from your project 4 agent into your project 5 agent as well. Both must be documented in the writeup
- 20 points for correctly implementing a second RL/GA task or integrating a learning method from project 4 into project 5 and documenting with a learning curve and paragraph describing it in the writeup
- 10 points if you implement it but do not give a second learning curve
- 5 points for bugs
Good coding practices: We will randomly choose from one of the following good coding practices to grade for these 10 points. Note that this will be included on every project. Are your files well commented? Are your variable names descriptive (or are they all i, j, and k)? Do you make good use of classes and methods or is the entire project in one big flat file? This will be graded as follows:
- 10 points for well commented code, descriptive variables names or making good use of classes and methods
- 5 points if you have partially commented code, semi-descriptive variable names, or partial use of classes and methods
- 0 points if you have no comments in your code, variables are obscurely named, or all your code is in a single flat method
Writeup: 30 points total. Your writeup is limited to 2 pages maximum. Any writeup over 2 pages will be automatically given a 0. Turn your writeup in to canvas and your code into spacesettlers.
- 20 points for collecting data and demonstrating learning using a learning curve (in the writeup). For full credit, make sure you explain why it is learning or not learning (if it isn’t learning, you will not lose your points if you can explain WHY it is not learning)
- 10 points for describing your RL or GA approach including your state space or chromosome and reward function or fitness and mutation and crossover in a paragraph or two