Hoifung Poon and
Pedro Domingos.
2010.
Unsupervised ontology induction from text. In
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp
296--305.
Association for Computational Linguistics. cit
62. [
semparse]
pdf abstract google scholar
Extracting knowledge from unstructured text is a long-standing goal of NLP. Al- though learning approaches to many of its subtasks have been developed (e.g., pars- ing, taxonomy induction, information ex- traction), all end-to-end solutions to date require heavy supervision and/or manual engineering, limiting their scope and scal- ability. We present OntoUSP, a system that induces and populates a probabilistic on- tology using only dependency-parsed text as input. OntoUSP builds on the USP unsupervised semantic parser by jointly forming ISA and IS-PART hierarchies of lambda-form clusters. The ISA hierar- chy allows more general knowledge to be learned, and the use of smoothing for parameter estimation. We evaluate On- toUSP by using it to extract a knowledge base from biomedical abstracts and an- swer questions. OntoUSP improves on the recall of USP by 47% and greatly outperforms previous state-of-the-art ap- proaches.
Tom Kwiatkowski,
Luke Zettlemoyer,
Sharon Goldwater and
Mark Steedman.
2010.
Inducing probabilistic CCG grammars from logical form with higher-order unification. In
Proceedings of the 2010 conference on empirical methods in natural language processing, pp
1223--1233.
Association for Computational Linguistics. cit
82. [
spf, semparse, d=geo]
pdf annote google scholar
(****)
Sentence to logical form mapping using CCG and unification (UBL) instead of GenLex.
Geo dataset, four languages, two meaning representations (funql, lambda).
Start with single lex item for each sentence mapping it to LF.
Introduce vertical bar | to ccg which can match / or \. (ZC07 similar?)
Understand the SGD gradient possibly reading CC07.
Starting with single lex item and trying splits look much less ad-hoc than ZC05,07 with Genlex and initial lexicon.
Only the proper noun NPs (e.g. Texas) are in the initial lexicon.
4.1 splitting constraints interesting, can learn them from data?
The split-merge process seem a bit ad-hoc, a more principled Bayesian approach may be possible.
Good related work discussion in Sec 6.
UBL geo880: p=.941 r=.850 f=.893
UBL-s (2pass): p=.885 r=.879 f=.882