CS 1571: Homework 4 (programming)

Naive Bayes: Spam Detection (100 pts)

Assigned: November 15, 2017

Due: December 6, 2017

In this assignment, you will create a Naive Bayes classifier for detecting e-mail spam, and you will test your classifier on a publicly available spam dataset using 5-fold cross-validation.

I. Implement Naive Bayes in Python, Java, or C#.

II. Evaluate your results.

  1. Error tables: Create a table with one row per fold showing your false positive, false negative, and overall error rates, and add one final row per table corresponding to the average error rates across all folds. For this problem, the false positive rate is the fraction of non-spam testing examples that are misclassified as spam, the false negative rate is the fraction of spam testing examples that are misclassified as non-spam, and the overall error rate is the fraction of overall examples that are misclassified.

Updated 11/28: See courseweb for detailed instructions regarding What to Submit (code and report)
Updated 11/30: See courseweb for detailed instructions regarding Grading Criteria (to allow you to get partial credit)