The
United States Department of Homeland Security plans to develop software
that analyses and summarises opinions expressed in articles, providing
a possible tool for better monitoring what is written about the US in
the global press.
The department says it will spend $2,4-million
over the next three years supporting research at three US universities
using computer science to analyse human language in texts.
"The
work is really designed to get information extraction that would help
the DHS review statements for sentiments or beliefs contained in
statements, and to provide intelligence analysts within DHS," said
Homeland Security spokesperson Christophe Kelly.
Kelly said the
software would offer the department staff "another resource to conduct
their work" -- even though the project has raised eyebrows among press
freedom advocates.
Janyce Wiebe of the University of Pittsburgh
in Pennsylvania, who will direct the research project, said that the
funding will go towards basic research and not any monitoring of the
global press.
The research will seek to "develop accurate and
robust techniques for extracting and summarising information about
events and opinions described in a text," Wiebe said.
Researchers
from Cornell University and the University of Utah will also
participate in the work, in a field computer scientists call "natural
language processing".
"Their focus is to develop simpler, more
efficient software, algorithms and mathematical architectures for use
in a broad range of computing applications," Kelly at the DHS said.
The
research team has gathered more than 270 000 articles from 180
news sources from around the world -- including Agence France-Presse --
between June 2001 and May 2002 covering a range of subjects including
elections in Zimbabwe, relations between China and Taiwan, treatment of
detainees at Guantanamo and the Kyoto protocol.
Each article has been manually annotated "with the meanings we want the software to learn to understand," Wiebe said.
The
software envisaged by the DHS-funded research would be capable of
tracking the ambiguities of human language, distinguishing the meaning
of a sentence depending on context and summarising descriptions and
opinions that appear in several different texts.
The researchers and DHS officials decline to discuss the possible uses of the software.
"It's just too early to speculate about what it would evolve into," Kelly said.
Several
press freedom organisations have expressed concern that the US
government wants to create a data base of certain media, particularly
outlets that are the most critical of Washington.
"We're taking
a very hard look to make sure that the outcomes of this are really in
line with the missions of the DHS" to protect the United States from
attack, said Kelly.
Asked if the software under development
could allow authorities one day to determine which media or journalist
appeared hostile to the United States, Kelly said it was too early to
say. - Sapa-AFP |