SPAM> (result-type '(:FILE #p'foo' :type ham :classification ham :score 0))
CORRECT
SPAM> (result-type '(:FILE #p'foo' :type spam :classification spam :score 0))
CORRECT
SPAM> (result-type '(:FILE #p'foo' :type ham :classification spam :score 0))
FALSE-POSITIVE
SPAM> (result-type '(:FILE #p'foo' :type spam :classification ham :score 0))
FALSE-NEGATIVE
SPAM> (result-type '(:FILE #p'foo' :type ham :classification unsure :score 0))
MISSED-HAM
SPAM> (result-type '(:FILE #p'foo' :type spam :classification unsure :score 0))
MISSED-SPAM
Having this function makes it easy to slice and dice the results of test-classifier in a variety of ways. For instance, you can start by defining predicate functions for each type of result.
(defun false-positive-p (result)
(eql (result-type result) 'false-positive))
(defun false-negative-p (result)
(eql (result-type result) 'false-negative))
(defun missed-ham-p (result)
(eql (result-type result) 'missed-ham))
(defun missed-spam-p (result)
(eql (result-type result) 'missed-spam))
(defun correct-p (result)
(eql (result-type result) 'correct))
With those functions, you can easily use the list and sequence manipulation functions I discussed in Chapter 11 to extract and count particular kinds of results.
SPAM> (count-if #'false-positive-p *results*)
6
SPAM> (remove-if-not #'false-positive-p *results*)
((:FILE #p'ham/5349' :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9999983107355541d0)
(:FILE #p'ham/2746' :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.6286468956619795d0)
(:FILE #p'ham/3427' :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9833753501352983d0)
(:FILE #p'ham/7785' :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9542788587998488d0)
(:FILE #p'ham/1728' :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.684339162891261d0)
(:FILE #p'ham/10581' :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9999924537959615d0))
You can also use the symbols returned by result-type as keys into a hash table or an alist. For instance, you can write a function to print a summary of the counts and percentages of each type of result using an alist that maps each type plus the extra symbol total to a count.
(defun analyze-results (results)
(let* ((keys '(total correct false-positive
false-negative missed-ham missed-spam))
(counts (loop for x in keys collect (cons x 0))))
(dolist (item results)
(incf (cdr (assoc 'total counts)))
(incf (cdr (assoc (result-type item) counts))))
(loop with total = (cdr (assoc 'total counts))
for (label . count) in counts
do (format t '~&~@(~a~):~20t~5d~,5t: ~6,2f%~%'
label count (* 100 (/ count total))))))
This function will give output like this when passed a list of results generated by test- classifier:
SPAM> (analyze-results *results*)
Total: 3761 : 100.00%
Correct: 3689 : 98.09%
False-positive: 4 : 0.11%
False-negative: 9 : 0.24%
Missed-ham: 19 : 0.51%
Missed-spam: 40 : 1.06%
NIL
And as a last bit of analysis you might want to look at why an individual message was classified the way it was. The following functions will show you:
(defun explain-classification (file)
(let* ((text (start-of-file file *max-chars*))
(features (extract-features text))
(score (score features))
(classification (classification score)))
(show-summary file text classification score)
(dolist (feature (sorted-interesting features))
(show-feature feature))))
(defun show-summary (file text classification score)
(format t '~&~a' file)
(format t '~2%~a~2%' text)
(format t 'Classified as ~a with score of ~,5f~%' classification score))
(defun show-feature (feature)
(with-slots (word ham-count spam-count) feature
(format
t '~&~2t~a~30thams: ~5d; spams: ~5d;~,10tprob: ~,f~%'
word ham-count spam-count (bayesian-spam-probability feature))))
(defun sorted-interesting (features)
(sort (remove-if #'untrained-p features) #'< :key #'bayesian-spam-probability))
Obviously, you could do a lot more with this code. To turn it into a real spam-filtering application, you'd need
