(if (loop for n in numbers never (oddp n))
(print 'All numbers even.'))
A thereis clause is used to test whether the test form is NIL value, the loop is terminated, returning that value. If the loop runs to completion, the thereis clause provides a default return value of NIL.
(loop for char across 'abc123' thereis (digit-char-p char)) ==> 1
(loop for char across 'abcdef' thereis (digit-char-p char)) ==> NIL
Now you've seen all the main features of the LOOP facility. You can combine any of the clauses I've discussed as long as you abide by the following rules:
• The named clause, if any, must be the first clause.
• After the named clause come all the initially, with, for, and repeat clauses.
• Then comes the body clauses: conditional and unconditional execution, accumulation, and termination test.[244]
• End with any finally clauses.
The LOOP macro will expand into code that performs the following actions:
• Initializes all local loop variables as declared with with or for clauses as well as those implicitly created by accumulation clauses. The initial value forms are evaluated in the order the clauses appear in the loop.
• Execute the forms provided by any initially clauses—the prologue—in the order they appear in the loop.
• Iterate, executing the body of the loop as described in the next paragraph.
• Execute the forms provided by any finally clauses—the epilogue—in the order they appear in the loop.
While the loop is iterating, the body is executed by first stepping any iteration control variables and then executing any conditional or unconditional execution, accumulation, or termination test clauses in the order they appear in the loop code. If any of the clauses in the loop body terminate the loop, the rest of the body is skipped and the loop returns, possibly after running the epilogue.
And that's pretty much all there is to it.[245] You'll use LOOP fairly often in the code later in this book, so it's worth having some knowledge of it. Beyond that, it's up to you how much you use it.
And with that, you're ready to dive into the practical chapters that make up the rest of the book—up first, writing a spam filter.
23. Practical: A Spam Filter
In 2002 Paul Graham, having some time on his hands after selling Viaweb to Yahoo, wrote the essay 'A Plan for Spam'[246] that launched a minor revolution in spam-filtering technology. Prior to Graham's article, most spam filters were written in terms of handcrafted rules: if a message has
To recognize individual spam features you have to try to get into the mind of the spammer, and frankly I want to spend as little time inside the minds of spammers as possible.
To avoid having to think like a spammer, Graham decided to try distinguishing spam from nonspam, a.k.a.
In this chapter, you'll implement the core of a spam-filtering engine. You won't write a soup-to-nuts spam- filtering application; rather, you'll focus on the functions for classifying new messages and training the filter.
This application is going to be large enough that it's worth defining a new package to avoid name conflicts. For instance, in the source code you can download from this book's Web site, I use the package name COM.GIGAMONKEYS.SPAM, defining a package that uses both the standard COMMON- LISP package and the COM.GIGAMONKEYS.PATHNAMES package from Chapter 15, like this:
(defpackage :com.gigamonkeys.spam
(:use :common-lisp :com.gigamonkeys.pathnames))
Any file containing code for this application should start with this line:
(in-package :com.gigamonkeys.spam)
You can use the same package name or replace com.gigamonkeys with some domain you control.[248]
You can also type this same form at the REPL to switch to this package to test the functions you write. In SLIME this will change the prompt from CL-USER> to SPAM> like this:
CL-USER> (in-package :com.gigamonkeys.spam)
#<The COM.GIGAMONKEYS.SPAM package>
SPAM>
Once you have a package defined, you can start on the actual code. The main function you'll need to implement has a simple job—take the text of a message as an argument and classify the message as spam, ham, or unsure. You can easily implement this basic function by defining it in terms of other functions that you'll write in a moment.
(defun classify (text)
(classification (score (extract-features text))))
Reading from the inside out, the first step in classifying a message is to extract features to pass to the score function. In score you'll compute a value that can then be translated into one of three classifications—spam, ham, or unsure—by the function classification. Of the three functions, classification is the simplest. You can assume score will return a value near 1 if the message is a spam, near 0 if it's a ham, and near .5 if it's unclear.
