For this homework we'll be using some machine learning software in java called WEKA which has a large variety of learners implemented and is set up to automatically perform experiments using cross-validation.
You can install WEKA and the Java runtime environment from the WEKA site.
It's very easy to run. Here's an example (linux command line), assuming the training data is in file.arff:
java
weka.classifiers.rules.ZeroR
-t file.arff
This will run ZeroR (zero rules) on the "file.arff" file, show the learned model, and evaluate it using cross-validation.
You can also download extra datasets from the UCI Machine Learning Repository in the WEKA arff file format (the datasets are described here).
Run the following three classifiers on the labor data included with WEKA (in data/labor.arff)
This is the decision tree classifier. It is based on C4.5.
java
weka.classifiers.trees.j48.J48 -t data/labor.arff
This is a boosted version.
java
weka.classifiers.meta.AdaBoostM1 -W
weka.classifiers.trees.j48.J48 -t data/labor.arff
This is a strawman algorithm that always picks the majority class.
java
weka.classifiers.rules.ZeroR
-t data/labor.arff
Please submit hardcopies and electronic versions of your experiments, showing the input and output of WEKA, as well as the answers to the above questions. Bring the hardcopies to class, and submit electronically following the class submission policies.
IMPORTANT NOTE:Points will be deducted if the submission procedure is not followed.