EC Fall 2020 Assignment Series 1 FAQ
Assignment 1a
- Random search is NOT expected to perform well. That is okay! You are not being graded on how well your algorithm solves the puzzle.
- Many students get overexcited about the problem and want to write complicated heuristic-based algorithms with problem-specific knowledge to generate solutions. DO NOT DO THIS! We want only naive random generation. Save your enthusiasm for later assignments in the series! The intention here is to demonstrate that EC is a valuable method for solving challenging problems where you may or may not have problem-specific knowledge and algorithms.
- This problem was intentionally chosen because it has a relatively simple specification and is intuitive (and fun!). That has the unfortunate consequence of making problem-specific algorithms easy to create. EC does not, in general, require problem-specific knowledge -- that's the entire point of a black-box search algorithm. The goal of this introductory class is not to have you create state-of-the-art machine learning algorithms that can compete with hand-made heuristics.
Assignment 1b
- This very basic EA is not expected to perform particularly well. Again, that is okay! It should do better than your 1a algorithm, but you are still not being graded on how well it solves the puzzle.
- Many students have historically been very confused about various terminology used, so take a moment to read this very carefully:
- Evaluation: The process of scoring a solution, whether or not it ends up being valid.
- Individual: A specific member of the population; its genotype encodes a solution; for static problems such as this one, every individual should be evaluated once and only once, whether or not it is a direct copy of a previously-evaluated individual and regardless of how many generations it survives.
- Generation: One cycle of evolution where a specific number of individuals compete with each other for parent and survival selection, and which reproduce with one another.
- Run: A single, independent execution of the EA which starts with initializing a population of individuals and then iterates over multiple generations until the specified termination criterion has been met.
- Experiment: For statistical purposes, a set of runs sharing the same experimental parameters, and a single random seed to allow the experiment to be exactly reproduced.
- Following from these definitions, you should be able to come up with a simple formula to calculate how many generations your EA should contain in a run that terminates after a set number of evaluations, using the following variables:
- the number of individuals created by random initialization for your first (or zero-th) generation (μ)
- the number of individuals created by reproduction for each subsequent generation (λ)
- the number of evaluations before termination (n)
- Use this formula to double-check that your EA is running for the correct number of generations, to make sure you're counting evaluations correctly!
Assignment 1c
- This EA may or may not perform better than your previous submissions. Penalty functions can improve results, but they can also make things worse. As always, you are not being graded on how well your algorithm solves the puzzle.
- Your statistical analysis must be based on only valid solutions. An invalid solution should never be compared to the results from 1a or 1b. Invalid solutions are only to be used to help your EA find valid solutions -- they are not an actual solution on their own.
- If you are implementing a repair function R, it needs to be designed in such a way that the output R(x) is the closest valid solution to the invalid solution x. What exactly defines "closest" is up to you, but must be logically justified. For example, randomly generating new solutions until one is valid is NOT a repair function. You have a lot of freedom here to design it as you wish; if in doubt, ask a TA if your repair function makes sense.
Assignment 1d
- Remember that the level of non-domination is now your official fitness metric. All comparison between individuals should be done using this new value. Your subfitnesses are used only to calculate this value.
- However, your statistical analysis should be based on the subfitnesses. While the level of non-domination is how you should compare individuals in the same generation with each other, it loses all meaning between generations (and between runs).
- Similarly, if you implement crowding or fitness sharing, remember that these lose their meaning outside of the run. Statistical analysis should be performed on the unmodified objective values.