Blitz: A Preprocessor for Detecting Context-Independent Linguistic Structures

Boris Katz, Deniz Yuret, Jimmy Lin, Sue Felshin, Rebecca Schulman, and Adnan Ilik (1998) ( DOC )
Blitz: A Preprocessor for Detecting Context-Independent Linguistic Structures. In Proceedings of the 5th Pacific Rim International Conference on Artificial Intelligence.

Abstract:

The flow of natural language is often broken by constructions which are difficult to analyze with conventional linguistic parsers. To handle these constructions, which include numbers, dates, addresses, etc., and, to a lesser extent, proper nouns, natural language systems typically implement specialized new rules. This leads to a level of complexity which renders development and maintenance difficult. Analyzing and tokenizing these constructions with an independent preprocessor can alleviate the burden on already taxed systems. Because these constructions have highly regular forms, and can be largely understood in the absence of context, it is possible to shift the burden of processing away from the primary parser, and onto a simpler, faster, non-linguistic preprocessor. This paper describes Blitz, a hybrid database- and heuristic-based natural language preprocessor, which has been integrated into the START Natural Language System in order to demonstrate how non-linguistic preprocessing can improve parsing. As a result, START's ability to analyze real-world sentences has improved considerably.