257
Several spam corpora including the SpamAssassin corpus are linked to from http://nexp.cs.pdx.edu/~psam/cgi-bin/view/PSAM/CorpusSets
.
258
If you wanted to conduct a test without disturbing the existing database, you could bind *feature- database*
, *total-spams*
, and *total-hams*
with a LET
, but then you'd have no way of looking at the database after the fact— unless you returned the values you used within the function.
259
This algorithm is named for the same Fisher who invented the method used for combining probabilities and for Frank Yates, his coauthor of the book
260
In ASCII, the first 32 characters are nonprinting
261
Some binary file formats
262
The term abcd
in hex, represented as a 16-bit quantity, consists of two bytes, ab
and cd
. It doesn't matter to a computer in what order these two bytes are stored as long as everybody agrees. Of course, whenever there's an arbitrary choice to be made between two equally good options, the one thing you can be sure of is that everybody is not going to agree. For more than you ever wanted to know about it, and to see where the terms http://khavrinen.lcs.mit.edu/wollman/ien-137.txt
.
263
LDB
and DPB
, a related function, were named after the DEC PDP-10 assembly functions that did essentially the same thing. Both functions operate on integers as if they were represented using twos-complement format, regardless of the internal representation used by a particular Common Lisp implementation.
264
Common Lisp also provides functions for shifting and masking the bits of integers in a way that may be more familiar to C and Java programmers. For instance, you could write read-u2
yet a third way, using those functions, like this:
(defun read-u2 (in)
(logior (ash (read-byte in) 8) (read-byte in)))
which would be roughly equivalent to this Java method:
public int readU2 (InputStream in) throws IOException {
return (in.read() << 8) | (in.read());
}
The names LOGIOR
and ASH
are short for ASH
shifts an integer a given number of bits to the left when its second argument is positive or to the right if the second argument is negative. LOGIOR
combines integers by logically LOGAND
, performs a bitwise LDB
and BYTE
will be both more convenient and more idiomatic Common Lisp style.
265
Originally, UTF-8 was designed to represent a 31-bit character code and used up to six bytes per code point.