Project 4: First Learning Project

Executive Summary

This is the first learning project for the semester. This project is estimated to be 10-15 hours of coding BUT I want to tell you right up front that it takes a lot of time to think about conceptually before you work out the coding! I estimate this will be the most challenging project (even if the next project is also learning, this one is harder conceptually as you have to think about how to do learning and how to collect data, etc). Also note, there will be a small additional time to actually produce learning curves. Note, as with all of the projects, graduate students have additional requirements.

By the end of this project, you will have accomplished the following objectives.

Designed a ML agent for use in a simulated real-world environment
Implemented an approach to collect data to train your chosen ML method
Implemented ML
Brought the ML method into your spacesettlers agent to positively change its behavior

Overview

Overview video

Project 4 task

This project will focus on learning, specifically methods from modules 5-6.

Your job is to create a spacesettlers agent that uses machine learning to control some aspect of its behavior. You can choose a task and appropriate method from module 6.
- I realize this is VERY open-ended and some of you may not be comfortable with that. Please see some suggestions below!!
- Because it is so open-ended, you MUST propose your learning task to us and get feedback it! We want to make sure you don’t choose an impossible task.
For the coop ladder, the environment will remain the same as it has for all previous projects EXCEPT you can control multiple ships now if you want.
For the compete ladder, based on the vote in class, we will use the Capture The Flag environment. Basically it is the same environment you have been using except there are fixed asteroids outlining places where flags can spawn and your task is to go collect your opponent’s flag and bring them back to base and you must control multiple ships in your team. See the link for the full rules!

Quick ideas to get you started

Build a decision tree or random forest to predict the success of firing at an enemy ship – you can track the bullet for its full lifetime and see if it succeeded in getMovementEnd by examining the ship hits or kills. Then choose your probability of firing based on success. Do NOT do this if you play in cooperative!
Build a decision tree or random forest to predict the success of an action (e.g. collect a beacon or collect an asteroid), given different characteristics of the asteroid or other target. Choose the action most likely to succeed.
Build a logistic regression classifier to do either of the two suggestions above (e.g. probability of firing or action success)
Use logistic regression or a decision tree or random forest to estimate probability of winning in 3D Tic Tac Toe OR use regression or kernel regression or KNN to estimate game board value and prune your search using these estimates (or use it in rollouts to help create a better opponent). These are not needed in 2D TTT since the board is so simple but they could be useful in the 3D version.
Use clustering to create a better set of vertices for A* search (rather than just hard-coded ones). This needs to be something beyond clustering on x,y space! For example, clustering in a higher-dimension that allows for intelligent search, perhaps using asteroid density etc.
Use regression or kernel regression or clustering to estimate the value of a target based on some higher-dimensional characteristics and use the output to decide which one to go after. Don’t do this in compete!
Use clustering to find ideal locations in a higher-dimensional space for new bases. This works for coop or compete.
The ideas are endless! You do not have to choose from this list, it is just to help you get thinking.

Things not to do

You cannot choose neural networks or deep learning. They are simply much too complicated and well beyond the scope of the class. You also can’t pick genetic algorithms or reinforcement learning as those will be the next project! You must pick a method from module 6!
You CAN implement your learning in a language other than java but you can NOT use existing ML libraries! No use of scikit-learn for example!! This will earn you an automatic 0. You can make use of existing non-ML libraries such as matrix libraries (e.g. numpy!)

Project proposal

To keep you from going astray on something that likely will not work, you have two due dates: the first is to propose the learning method(s) and task(s) and the second is the regular project deadline. See below and canvas for the separate due dates.
Note that CS 5033 students must propose and implement two learning methods.

How Do I… FAQ and other helpful videos

I made a FAQ for this project over the past few years and filmed some videos (and have some notes!) to help make your project more successful! Please watch before you go implement! Keep in mind most of these videos were filmed last year so I was using a different IDE but the ideas are very much the same!

How do I choose a task and learning method to match the task? How do I know the task will be successful?
- Short answer: Try to pick a task where you can’t already solve it by search (e.g. A* already does optimal search so why replace that?). Pick a task where learning can find a strategy that you can’t simply hard-code.
- I talk a bit about this in last year’s video below but I’ve added a new one this year also.
Ok, I picked a task and I need to figure out how to implement learning. How do I get started? How do I save data to an external file so I can be more successful at learning?
- In order to learn anything, you will need to collect a LOT of data. Use the initialize() function to open a file handle, save whatever data you need for learning during the agent’s lifetime, and then use the shutdown() function to close the file handle. There is a video on how to do this below!
- I shot this video last year as part of a sequence of videos but it is very applicable here and talks about how to save the files and gives you examples in the code.
How do I run my agent enough times to collect data? It is SOOOOOO slow to run with graphics and I need thousands of examples!
- Fear not! You can use the ladder!
Now that I saved data, how do I write my learning agent?
- You can implement the learning offline, outside of the spacesettlers system so long as you can read in your model back into your agent and use it. Note, you can use other languages also but you CAN NOT use existing ML libraries, no matter what language you choose!
Once I have trained my agent, how do I use the learning agent inside my spacesettlers agent? This is both technical (how do I load it back in?) and philosophical (how do I use it?).
Can I have multiple ships?
- While not required, you are allowed to have multiple ships this project if you want!
I’m overwhelmed – can I have an example using a decision tree?
- Yes! I shot this video sequence last year but it very much still applies! Ignore the parts about “I’ve already approved your proposals” because this was for last year’s class. But if you want to think about how to do learning, this is a good start.
I want to do trees but I have no idea how to do a real-valued tree. Help!
- This is again part of last year’s video sequence but it should help a lot with real-valued trees! Note there are two videos to watch for this sequence, first choosing the best attribute and second an example.
Feel free to ask more FAQs in slack. I will reply there and update this document and even add extra videos if they are needed.

Extra credit

The extra credit ladders remain the same as with all previous projects. You are welcome to choose a different ladder path than you chose for either of the previous projects. The class-wide ladders will start on Oct 30 2023.
- Note: the EC for the CTF ladder requires beating the heuristic for CTF as well as being at the top of the ladder (same rules as before, just new heuristic to compete against).
The extra credit opportunities for being creative and finding bugs remain the same as in the previous projects. Remember you have to document it in your writeup to get the extra credit!

Part 1: Due Oct 27 11:59 PM

Turn in a ONE page project proposal on canvas here. This proposal must:

Fully specify your proposed method(s) (still limited to one page even if you have two methods).
It should say what kind of data you will collect and how
It should specify how the method will be implemented back into your agent to improve the performance

Part 2: Due Nov 10 11:59 PM

Update your code from the last project. You can update your code at the command line with “git pull”. If you did not get the code checked out for project 0, follow the instructions to check out the code in Project 0.
Remember: if you are using the compete choice, you MUST switch to the captureTheFlagCompetitive config files and use the ant target spacesettlers-ctf-compete.
Write your learning code as described above
Build and test your code using the ant compilation system and your chosen IDE.
Submit your project on spacesettlers.cs.ou.edu using the submit script as described below. You can submit as many times as you want and we will only grade the last submission.
- Submit ONLY the writeup to the correct Project 4 on canvas: Project 4 for CS 4013 and Project 4 for CS 5013
- Copy your code from your laptop to spacesettlers.cs.nor.ou.edu using the account that was created for you for this class (your username is your 4×4 and the password that you chose in project 0). You can copy using scp or winscp or pscp.
- ssh into spacesettlers.cs.nor.ou.edu
- Make sure your working directory contains all the files you want to turn in. All files should live in the package 4×4. Note: The spacesettlersinit.xml file is required to run your client!
- Submit your file using one of the following commands (be sure your java files come last). You can submit to only ONE ladder. If you submit to both, small green monsters will track you down and deal with you appropriately.

/home/spacewar/bin/submit --config_file spacesettlersinit.xml \
--project project4_coop \
--java_files *.java

/home/spacewar/bin/submit --config_file spacesettlersinit.xml \
--project project4_compete \
--java_files *.java

- After the project deadline, the above command will not accept submissions. If you want to turn in your project late, use:

/home/spacewar/bin/submit --config_file spacesettlersinit.xml \
--project project4_coop_late \
--java_files *.java

/home/spacewar/bin/submit --config_file spacesettlersinit.xml \
--project project4_compete_late \
--java_files *.java

Rubric – Part 1 Due Oct 27 11:59pm

10 points for project proposal
- 10 points for turning in a proposal that:
  - Fully specifies your proposed method(s) (still limited to one page even if you have two methods).
  - It should say what kind of data you will collect and how
  - It should specify how the method will be implemented back into your agent to improve the performance
- 0 points for not turning in a proposal (You WANT to turn in a proposal – you want to not pick a project that will not work!)

Rubric – Part 2 Due Nov 10 11:59pm

Learning
- 40 points for correctly implementing the learning method that you proposed and got feedback on (if you were told to choose a different method, you need to implement the method you were told to adjust to). A correct learner uses learning in a way to improve performance and learning will be demonstrated in the writeup (though the curve is graded separately) using a learning curve. Learning code should be well documented to receive full credit.
- 35 points if there is only one minor mistake.
- 30 points if there are several minor mistakes or if documentation is missing.
- 25 points if you have one major mistake.
- 10 points if you accidentally implement a learning algorithm other than what you intended and it at least moves the ships around the environment in an intelligent manner.
NOTE: if you implement your learning code in a language other than java, make sure you still turn it in using submit for grading! We cannot grade what we cannot see!!
Graphics
- 10 points for correctly drawing graphics (or using printouts) that enable you to debug your learning and that help us to grade it.
- 7 points for drawing something useful for debugging and grading but with bugs in it
- 3 points for major graphical/printing bugs
CS 5013 students only: You must implement a second learning method and document it in the writeup
- 20 points for correctly implementing a second learning method and documenting with a learning curve and paragraph describing it in the writeup
- 10 points if you implement it but do not give a second learning curve
- 5 points for bugs
Good coding practices: We will randomly choose from one of the following good coding practices to grade for these 10 points. Note that this will be included on every project. Are your files well commented? Are your variable names descriptive (or are they all i, j, and k)? Do you make good use of classes and methods or is the entire project in one big flat file? This will be graded as follows:
- 10 points for well commented code, descriptive variables names or making good use of classes and methods
- 5 points if you have partially commented code, semi-descriptive variable names, or partial use of classes and methods
- 0 points if you have no comments in your code, variables are obscurely named, or all your code is in a single flat method
Writeup: 30 points total. Your writeup is limited to 2 pages maximum. Any writeup over 2 pages will be automatically given a 0. Turn your writeup in to canvas and your code into spacesettlers.
- 20 points for collecting data and demonstrating learning using a learning curve (in the writeup). For full credit, make sure you explain why it is learning or not learning (if it isn’t learning, you will not lose your points if you can explain WHY it is not learning)
- 10 points for describing your learning method in a paragraph or two and explaining why you chose to demonstrate learning in the curve that you present (e.g. I graphed decision trees by the number of leaf nodes to show overfitting or I graphed regression by RMSE over iterations to show it lowered error over time.)