Читать книгу Practical Common Lisp, автор Siebel Peter онлайн страница 119 на сайте booksonline.com.ua.

Книга жанра: Компьютеры и Интернет, Программирование. Читать онлайн в библиотеке Booksonline.

ЧИТАТЬ КНИГУ ОНЛАЙН: Practical Common Lisp

(Siebel Peter)

Жанр : Программирование;

НАСТРОЙКИ....

Цвет фона

Цвет текста

Размер шрифта

СОДЕРЖАНИЕ....

Close
СОДЕРЖАНИЕ

Booksonline.com.ua

Программирование

Practical Common Lisp - Siebel Peter

Стр. 119

1
« ...
117
118
119
120
» ...
218

in a moment, that takes a filename and a maximum number of characters to return. train-from- corpus looks like this:

(defparameter *max-chars* (* 10 1024))

(defun train-from-corpus (corpus &key (start 0) end)

(loop for idx from start below (or end (length corpus)) do

(destructuring-bind (file type) (aref corpus idx)

(train (start-of-file file *max-chars*) type))))

The test-from-corpus function is similar except you want to return a list containing the results of each classification so you can analyze them after the fact. Thus, you should capture both the classification and score returned by classify and then collect a list of the filename, the actual type, the type returned by classify, and the score. To make the results more human readable, you can include keywords in the list to indicate which values are which.

(defun test-from-corpus (corpus &key (start 0) end)

(loop for idx from start below (or end (length corpus)) collect

(destructuring-bind (file type) (aref corpus idx)

(multiple-value-bind (classification score)

(classify (start-of-file file *max-chars*))

(list

:file file

:type type

:classification classification

:score score)))))

A Couple of Utility Functions

To finish the implementation of test-classifier, you need to write the two utility functions that don't really have anything particularly to do with spam filtering, shuffle-vector and start- of-file.

An easy and efficient way to implement shuffle-vector is using the Fisher-Yates algorithm.[259] You can start by implementing a function, nshuffle-vector, that shuffles a vector in place. This name follows the same naming convention of other destructive functions such as NCONC and NREVERSE. It looks like this:

(defun nshuffle-vector (vector)

(loop for idx downfrom (1- (length vector)) to 1

for other = (random (1+ idx))

do (unless (= idx other)

(rotatef (aref vector idx) (aref vector other))))

vector)

The nondestructive version simply makes a copy of the original vector and passes it to the destructive version.

(defun shuffle-vector (vector)

(nshuffle-vector (copy-seq vector)))

The other utility function, start-of-file, is almost as straightforward with just one wrinkle. The most efficient way to read the contents of a file into memory is to create an array of the appropriate size and use READ-SEQUENCE to fill it in. So it might seem you could make a character array that's either the size of the file or the maximum number of characters you want to read, whichever is smaller. Unfortunately, as I mentioned in Chapter 14, the function FILE- LENGTH isn't entirely well defined when dealing with character streams since the number of characters encoded in a file can depend on both the character encoding used and the particular text in the file. In the worst case, the only way to get an accurate measure of the number of characters in a file is to actually read the whole file. Thus, it's ambiguous what FILE-LENGTH should do when passed a character stream; in most implementations, FILE-LENGTH always returns the number of octets in the file, which may be greater than the number of characters that can be read from the file.

However, READ-SEQUENCE returns the number of characters actually read. So, you can attempt to read the number of characters reported by FILE- LENGTH and return a substring if the actual number of characters read was smaller.

(defun start-of-file (file max-chars)

(with-open-file (in file)

(let* ((length (min (file-length in) max-chars))

(text (make-string length))

(read (read-sequence text in)))

(if (< read length)

(subseq text 0 read)

text))))

Analyzing the Results

Now you're ready to write some code to analyze the results generated by test-classifier. Recall that test-classifier returns the list returned by test-from-corpus in which each element is a plist representing the result of classifying one file. This plist contains the name of the file, the actual type of the file, the classification, and the score returned by classify. The first bit of analytical code you should write is a function that returns a symbol indicating whether a given result was correct, a false positive, a false negative, a missed ham, or a missed spam. You can use DESTRUCTURING-BIND to pull out the :type and :classification elements of an individual result list (using &allow-other- keys to tell DESTRUCTURING-BIND to ignore any other key/value pairs it sees) and then use nested ECASE to translate the different pairings into a single symbol.

(defun result-type (result)

(destructuring-bind (&key type classification &allow-other-keys) result

(ecase type

(ham

(ecase classification

(ham 'correct)

(spam 'false-positive)

(unsure 'missed-ham)))

(spam

(ecase classification

(ham 'false-negative)

(spam 'correct)

(unsure 'missed-spam))))))

You can test out this function at the REPL.

Вперед

Вы читаете Practical Common Lisp

1
« ...
117
118
119
120
» ...
218

Добавить отзыв

ВСЕ ОТЗЫВЫ О КНИГЕ В ИЗБРАННОЕ

Вы можете отметить интересные вам фрагменты текста, которые будут доступны по уникальной ссылке в адресной строке браузера.

Отметить Добавить цитату

Материалы, присутствующие на сайте, получены с публичных (широкодоступных) ресурсов. Если вы обладаете авторским правом на какую либо информацию, размещенную на сайте booksonline.com.ua и не согласны с её общедоступностью в будущем, то мы согласны рассмотреть предложения по удалению определенного материала, а также обсудить предложения о договоренностях, разрешающих использовать данный контент. Мы не отслеживаем действия пользователей, которые самостоятельно выкладывают источники текстов, являющиеся объектом вашего авторского права. Все данные на сайт, загружаются автоматически, не проходя заранее отбора с чьей либо стороны, что является нормой в мировом опыте размещения информации в сети интернет.

Не смотря на это, при возникновении у Вас вопросов касательно ссылок на информацию, размещенную на нашем сайте, правообладателями которой Вы являетесь, просим обращаться к нам с интересующим запросом. Для этого требуется переслать е-mail на адрес: [email protected]. В письме настоятельно рекомендуем подать такие сведения : 1.Документальное подтверждение ваших прав на материал, защищённый авторским правом: отсканированный документ с печатью, либо иная контактная информация, позволяющая однозначно идентифицировать вас, как правообладателя данного материала. 2. Прямые ссылки на страницы сайта, которые содержат ссылки на файлы, которые есть необходимость откорректировать.

Все права защищенны booksonline.com.ua