Raconteur

“G od afternoon, ladies and gentlemen. My name is Joshua Strickland, team lead for visual intelligence development here at the Stanford Vision Lab. I’d like to thank you all for coming today.”

Strickland stood at the head of a darkened, windowless lecture hall in the basement of the Gates Computer Science Building. Beside him the camera-eye logo of the Vision Lab filled a large projection screen. In the PowerPoint afterglow he saw familiar and unfamiliar faces among a small audience seated primarily in the front two rows. He focused on the serious faces seated just before him.

“An especially warm welcome to our distinguished guests from the Transformational Convergence Technology Office. Thanks also to our faculty advisor, Doctor Lei Li, without whose support we would not be presenting to you today.”

There was timid applause from somewhere in the darkness.

Strickland paused to collect his thoughts. So much was riding on this. He took a breath then began, “What you’re about to see is a visual intelligence technology we call Raconteur.” A click of his wireless remote, and the slide changed to an animation of dozens, then hundreds, and then thousands of individual video insets, swarming. It was a vast stream of graphic data. “Visual intelligence is often confused with ‘computer vision’-but it’s much more than that. Visual intelligence means giving machines the ability not merely to identify objects in images-which has been possible for years-but the cognitive ability to discern what’s occurring in a scene. Concept detection, integrated cognition, interpolation- prediction. What could have happened, and what might happen next. It means giving machines not only the ability to see but to understand what they see.”

He searched the faces of those front and center. “Why is this important?”

He clicked the remote, and the slide changed to surveillance images of London subway bombers moving through stations and standing in railcars. “In an increasingly dangerous world, video surveillance represents society’s best hope to detect threats before they materialize. But this flood of visual imagery means an exponential increase in the volume of surveillance video that must be analyzed-and analyzed real-time if it is to be of use not just in reviewing criminal acts after the fact but in preventing criminal acts.”

The image changed to that of a burned-out Starbucks on an urban street. Then another photo from a newspaper showing a burned-out SUV beneath the headline SENATOR ASSASSINATED IN TERROR BOMBING. “We need only consider the recent unsolved terror bombings here in the United States to recognize how critical visual intelligence is to our future.”

Strickland scanned the faces of his audience. They were with him.

“How do we imbue machines with this ability? We do this by emulating the way humans process spatiotemporal events. Human visual cognition is closely attuned to change, and it’s these changes that create what we call ‘attention states.’ We acquire ‘attention states’ from video imagery through an algorithmic mechanism that includes notions of focus of attention, markers placed on salient objects, and the critical relationships between those objects in terms of motion and contact. These are necessary to distinguish individual events from one another. A series of attentional states over time then becomes a visual attention trace-or VAT-which begins to form the elements of a story. One that can be programmatically narrated through machine-readable text-text that can then be algorithmically searched for relevance, in real time, by an ‘audience’ of other, simpler programs. This is why we call our system ‘Raconteur’-because it tells the story of what’s happening in a way that common systems can understand. And like any good storyteller, ‘Raconteur’ remembers how the current scene fits into the whole.”

Strickland knew that his combination of youth and poise would be an advantage here. Disruptive technology was like that. Now, at twenty-two, he was leading a team that was about to revolutionize visual image processing. Although he wasn’t the driving force behind the innovations, he did know how to spot and recruit talent to his work teams. If history was any guide, that was the primary skill necessary for success in Silicon Valley. Being able to spot a good idea and knowing who could make it work. Removing obstacles and inspiring others, that was the biggest part of innovation.

“We have worked with DARPA’s technical staff to coordinate the following demonstration, in strict adherence to the Mind’s-Eye Project guidelines. Please remember that our system has not been previously exposed to the images that you-and it-are about to see. We look forward to taking your questions after the test. Until then, ladies and gentlemen, I give you ‘Raconteur,’ the storyteller…”

More light applause as the screen went black.

Strickland stepped aside as two smaller screens glowed to life up front-one bearing the title “TCTO Phase 1- Recognition Test.” The other screen displayed a blinking cursor.

Strickland moved to the side to stand with his project team, bracing for whatever came next. He cast a tense look at his development lead, Vijay Prakash, but the handsome, dour Bengali ignored Strickland’s arched eyebrows and looked to the screen. The rest of the grad student crew-Sourav Chatterjee, Gerhard Koepple, Wang Bao-Rong, and Nikolay Kasheyev-nodded in acknowledgment of the moment. Then they all turned to watch the screens too.

The words “TCTO Phase 1-Recognition Test” soon appeared also on the right-hand screen. The twin projections were set up so that whatever appeared in the left-hand screen, Raconteur would have to make sense of and describe in text on the right-hand screen.

Strickland felt relief wash over him as he stood in the darkness. Failing simple character recognition while reading the title card would have killed them, but then, OCR was handled by a licensed library, not their code. Still, he knew the DARPA judges wouldn’t cut them any slack for choosing a bad library.

But the test was already moving on. No time to ponder disaster scenarios. The left-hand screen changed to black-and-white surveillance video. It depicted a woman walking down an office hallway carrying a cardboard records box.

Strickland tensed again. He’d seen the VI algorithms work a hundred thousand times and had a pretty good idea how they functioned, but they’d never been run live in front of such an important audience. What happened next would decide the next several years of his life-of their lives-and quite possibly the trajectory of Strickland’s career. He focused on the blinking cursor on the right-hand screen-the Raconteur output panel.

As the video continued, text began to appear…

Person carries object along corridor.

Murmurs of approval swept through the room, but Strickland remained tense. C’mon. Do it. Do it, baby…

The cursor then began expanding on the details.

Woman carries box along corridor.

More murmurs and some clapping. Strickland cast a glance at the DARPA managers, who were nodding and talking softly among themselves. Taking notes. A wave of relief flowed through him. He’d had no idea how clenched he was, but now that initial impressions were good, the judges would be more receptive if there was a later glitch. He told himself that no matter what happened from here on, they had at least avoided a meltdown. They had gotten on the scoreboard.

The scene changed to an exterior; an American soldier standing on a littered street in some Middle Eastern slum, weapon slung and motioning to unseen people. A small-possibly Iraqi-child entered the frame behind him. Strickland felt the dread returning, as the text scrolled…

Armed person… approached by child.

More applause and some actual shouts of excitement.

Strickland felt a smile crease his face before he clamped down on it. Too early to celebrate.

Uniformed soldier approached by child in street.

The hoots continued. So far so good, but Strickland knew the difficulty levels were only going to increase. As he watched, the system mistook another soldier entering the frame as a possible threat- #ALERT-armed person. Not too far off the truth, though.

The control frame faded to black and displayed the title: “TCTO Phase 1-Interpolation Test.”

Here we go. The complexity of visual concepts ramped up fast. It was why their system focused on deriving context first while interpreting a scene, and why it never forgot what it had seen previously. That was key to avoiding a lot of useless processing. Humans walking down a city sidewalk, for example, do not suddenly expect to see a mountain vista or a rolling sea all around them. That would be impossible-thus, even if these things appeared, they were likely to be graphical representations like ads, not the actual thing. Daisy-chaining events made it

Вы читаете Kill Decision
Добавить отзыв
ВСЕ ОТЗЫВЫ О КНИГЕ В ИЗБРАННОЕ

0

Вы можете отметить интересные вам фрагменты текста, которые будут доступны по уникальной ссылке в адресной строке браузера.

Отметить Добавить цитату