Integrating Web Resources and Lexicons into a Natural Language Query System

Boris Katz, Deniz Yuret, Jimmy Lin, Sue Felshin, Rebecca Schulman, Adnan Ilik, Ali Ibrahim, and Philip Osafo-Kwaako (1999) ( PS )
Integrating Large Lexicons and Web Resources into a Natural Language Query System. In Proceedings of the 6th IEEE International Conference on Multimedia Computing and Systems (IEEE ICMCS'99)


The START system responds to natural language queries with answers in text, pictures, and other media. START's sentence-level natural language parsing relies on a number of mechanisms to help it process the huge, diverse resources available on the World Wide Web. Blitz, a hybrid heuristic- and corpus-based natural language preprocessor, enables START to integrate a large and ever-changing lexicon of proper names, by using heuristic rules and precompiled tables of symbols to preprocess various highly regular and fixed expressions into lexical tokens. LaMeTH, a content-based system for extracting information from HTML documents, assists START by providing a uniform method of accessing information on the Web in real time. These mechanisms have considerably improved START's ability to analyze real-world sentences and answer queries through expansion of its lexicon and integration of Web resources.