Integrating Web Resources and Lexicons into a Natural Language Query
System
-
Boris Katz, Deniz Yuret, Jimmy Lin, Sue Felshin, Rebecca Schulman,
Adnan Ilik, Ali Ibrahim, and Philip Osafo-Kwaako (1999)
( PS )
-
Integrating Large Lexicons and Web Resources into a Natural Language
Query System. In Proceedings of the 6th IEEE
International Conference on Multimedia Computing and Systems (IEEE
ICMCS'99)
Abstract:
The START system responds to natural language queries with answers in
text, pictures, and other media. START's sentence-level natural
language parsing relies on a number of mechanisms to help it process
the huge, diverse resources available on the World Wide Web. Blitz, a
hybrid heuristic- and corpus-based natural language preprocessor,
enables START to integrate a large and ever-changing lexicon of proper
names, by using heuristic rules and precompiled tables of symbols to
preprocess various highly regular and fixed expressions into lexical
tokens. LaMeTH, a content-based system for extracting information
from HTML documents, assists START by providing a uniform method of
accessing information on the Web in real time. These mechanisms have
considerably improved START's ability to analyze real-world sentences
and answer queries through expansion of its lexicon and integration of
Web resources.