CSCI 446 |
No matter which technique(s) you chose, you must remember that you should use a portion of your data for training and a portion for testing. The technique(s) you choose should be able to read in a training data set and produce a structure that represents what it has learned. It should also be able to read in a test data set (without the outcome variable) and predict what that outcome should be. You probably want to work out some way of measuring how successful the predictions were.Instance Based Learning (knn) Clustering (kMeans) Rule Based Learning Decision Trees Artificial Neural Networks Genetic Algorithms
1. Electronic Version of Source Code 2. Compilation Instructions 3. Run Instructions 4. If you modified the datasets to work with your code, send me the modified versions also. 5. A Description of what you did and the results: A. Tell me how you treated the data (how much you used for training and testing, did you discretized, did you normalize, etc.) B. Tell me any assumptions you made in your algorithm (did you use pre-pruning or post-pruning to compensate for noise, etc.) C. Tell me the results of running your algorithm on the test data – what percentage did your approach get correct, etc. Note: this implies that the structure you built must be usable for predicting an outcome, that is, decision trees or rules must be executable in some form. Worst case, you can produce a set of rules or a tree and manually walk through the test data, but if you do this, tell me this is what you did. Also, tell me if there are any datasets that your code will not work on, e.g. if it won't work with numeric data or with missing data, etc.
Page last updated: August 17, 2018