Eric Williams 08/06/02 Fixes: Minimum coverage and minimum positive coverage are now consistent - both interpret values less than 1 as percentages. Changes: Importing new training data no longer resets the EG method, CF method, and thresholds. Features added: There is an additional import option - attribute defintion file. Create that file as a tab-delimited text file with the following setup. attrib1 attrib2 attrib3 ... attribN tag1 tag2 tag2 ... tagN type1 type2 type3 ... typeN use1 use2 use3 ... useN tag is a semantic tag for use in constraints. Use "none" as a filler if needed. type is either discrete, continuous, or set use is either input, output, identification, or ignore IMPORTANT NOTE: Attribute type information will override automatic type detection. Be sure you know what you're doing. Eric Williams 08-02-02 Features added: cost-based learning - works as follows There is an additional import option - misclassification cost file. Create that file as a tab-delimited text file with the following setup target_class predicted_value actual_value cost_of_error cost_of_error is a floating point value that represents the penalty incurred for making that prediction mistake. The default penalty is 1.0. This is to avoid divide-by-zero errors when calculating the CF to average cost ratio. All of the evidence gathering methods that normally select based on CF will now do so based on that ratio. IMPORTANT NOTE: Right now, I can only guarantee correct functionality for discrete target classes. Be certain that discrete targets are not numerical (eg turn 4 into G4 or four) all-pairs evaluation evidence gathering method There is now an EG method previously only used for my testing. It uses all of the other methods and compares them. The comparision results can only be seen if running the program from within an IDE or from the command line. Eric Williams 07-15-2002 Fixes: All the evidence gathering methods work the way they're supposed to (Don't ask). Changes: After running comparisons of all the evidence gathering methods, weighted voting is now the default. Also some methods have been removed due to poor performance. Eric Williams 06-05-2002 Fixes: I believe that the choking problem has been fixed, or at the very least reduced. I ran a 146-attribute, ~2300-case dataset and it didn't crash like it would with the 03-26-02 code. It still took a very long time, though, so one of my goals for the next version is to make the code more efficient. Changes: 1. I am using a new version numbering system. I estimate that JavaRL is about 80% complete, so this version is 0.8. The old numbering system remains for previous versions. To avoid ambiguity, all old version numbers say "(old numbering)" after them. When the new version 1.0 comes along, no mention will be made of the old system. 2. The Beam class (Beam.java) is now deprecated. I have created a PriorityQueue interface. The only implementation thus far is a binary heap (HeapPriorityQueue.java). The SAL code (SAL.java) as well the evidence gathering code had to be modified to accomodate the new data structure. Several private classes were made public in separate files. SAL.java was just to large and unwieldy to have so many classes stored within it. Besides, it's better coding practice to put classes into seaprate files. Features added: 1. new evidence gathering method - "Most specific AND highest CF rule" (MostSpecificSingleBest.java). From several "best" rules, it picks the one with the most conjuncts AND the highest CF. P-value is used as a tie-breaker. Note that this is different from SingleBestSpecific. Other comments & recommendations: I suggest using Borland JBuilder 6 Professional ($100 through Pitt Book Center) as a development environment. It's a lot faster, more stable, and intuitive than Forte. The changes I have made and will make in the future are drawn from the needs/wants list at http://www.cs.pitt.edu/~edwst7/JavaRL/JavaRL_bugs.txt. Will Bridewell 03 - 26 - 2002 Updates 1. P-values are now correctly calculated as the right-tail The p-values are calculated by a child of CertaintyFactor, but are not included in the CF list. There are notes in CertaintyFactor.java explaining why. (Rule.java) (CertaintyFactor.java) (PValueRight.java) (PValue.java) 2. Larger datasets are now usable. (SAL.java) 3. Fixed class-checks in equals method of the attribute value nodes. (DiscreteNode.java) (IntervalNode.java) (SetNode.java) Will Bridewell 03 - 02 - 2002 General Changes: 0. Minor bug fixes when I came across them. ExpertPanel defaults are now consistent Information gain is calculated for the results w/o user action Error messages if the user tries to specify multiple target/id fields 1. Replaced String concatenation with StringBuffer.append when appropriate 2. Fixed misspellings of continuous and hierarchy 3. Attempted to rearrange comments based on Java conventions 4. Attempted to reorganize/rename files based on Java conventions 5. Some classes had implemented the serializable interface, I removed this implementation in many cases. 6. Added the ability to specify an attribute for ID. There is currently no check for uniqueness of the ID field There is no automatic creation of IDs if a field is not specified 7. Added the date and file name to exported results and rule files 8. Added attribute definitions to the exported rules file This only contains attributes used in the rules for various reasons 9. Changed default file extensions for exported files 10. The value-hierarchy editor has been updated to be more useful Problems: 0. Do a full path search for RED FLAG or YELLOW FLAG to see notes on places that may be problem areas. I think most of these are gone by now--or the yellow flags aren't important. 1. Export and import of rules/attributes/values/etc. doesn't really work. If you try to use these features, you will receive an error pop-up. 2. Prior rules are handled in the simplest possible way--see notes in SAL.processPriorRules(). Functionality to add: 1. The ApplyRules section is really beta code. It has minimum functionality. Some notes are contained in ApplyRulesPanel.java ApplyRulesHandler.java RawResult.java My comments on this section aren't detailed, and it may not be clear what direction I wanted to go with this functionality. If you want to extend it, you may want to talk to me. 2. We need to formally store results, parameters and other session information for each application of RL. All information is currently available, but it isn't in one place, and it isn't all stored after the run is finished. This is crucial for proper export/import functionality as well as creating a more useful version of rule application. 3. Eventually, we ought to figure out where to put the help files to make them accessible. Suggestions in the order that they should be addressed: 1. Test the SAL learning algorithm and fix any major bugs. I am 95% confident that SAL works as planned, but that's not 100%. 2. Fix GUI bugs as they show themselves as these are usually simple. 3. Extend processing of prior rules during learning. 4. Create the object to store session information. 5. Fix export/import functionality 6. Improve rule application. 7. Make a list of desired features. Other comments & recommendations: I'm familiar with almost all of the code by now and can give valuable input if needed. If you would like to go over any of the code with me, just make an appointment. I suggest using Sun's Forte for Java development environment. It takes some time to familiarize yourself with it, but unlike JBuilder Personal Edition, you are not restricted from distributing developed software, and unlike Microsoft Visual J++, the JDK is (can be kept) up-to-date. The community edition of Forte is free, and it's feature-rich. The Help files require access to the JavaHelp libraries. Even when these were available, I had problems consistently getting the help system to initialize. Someone should figure this out. Also, someone should update the help files as needed.