BibTeX

Onur Kuru and Deniz Yuret. 1900. Recognizing Lexical Entailment using Substitutability. ?. (in preparation). [ai.ku] annote google scholar

============================================================================ COLING 2016 Reviews for Submission #378 ============================================================================ Title: Recognizing Lexical Entailment using Substitutability Authors: Onur Kuru and Deniz Yuret ============================================================================ REVIEWER #1 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Relevance: 5 Originality: 3 Technical correctness / soundness: 4 Readability and clarity: 4 Meaningful comparison: 5 Substance: 3 Impact of ideas: 3 Impact of resources: 1 Overall recommendation: 4 Reviewer Confidence: 5 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper proposes a new solution for lexical entailment that brings together substitutability with entailment. While substitutability has always been tied in with the definition of lexical entailment, the authors claim that this is the first attempt to directly model that aspect. They compare their (unsupervised) approach to other existing approach, showing a respectable performance across different datasets and settings. Overall, I like this paper. It is clear to read, and uses simple ideas, but at the same time it tries to approach the problem from a slightly different point of view than existing work, and shows that this novel approach does fairly well. While lexical entailment cannot always be modeled by substitution (for example, when the word pair has different POS tags), there is definitely some advantage to tying the two together - and this paper does that much more explicitly than prior work. I have some (mostly minor, except point 1) concerns with this work though: 1) The most major concern is a factual error in Table 3 - the numbers for balAPinc evaluated on KDSZ in the different setting are incorrect (the Turney and Mohammad paper reports 0.60 AP1 and 0.60 AP0). This possibly invalidates some of the claims made in this paper regarding the efficacy of balAPinc v/s Subs. 2) Some of the comparisons in this paper are not apples-to-apples since the authors in this work do not deal with multi-word expressions and hence they have to work with only a subset of some of the datasets (specifically the comparisons on the KDSZ and the Zeichner datasets). 3) Since this is an unsupervised setup, I would have liked a bit more detail on the experimental setup. Is it k-fold cross validation, or do you just use a small subset of the dataset to tune thresholds? 3) The error analysis is not very insightful. Instead of reporting performance across corpora/# of tokens/n in n-gram, I would have liked to see an error analysis that is specific to the approach, along the lines of the "Substitute Distributions" sections. For example, what is it that makes this model better? Where does it do better than balAPinc? Are the things that balAPinc can get that the Subs model cannot get right? 4) There are some grammatical issues and typos. For instance : - Page 1, Second paragraph, the sentence starting "Since lexical entailment.." is not grammatical - Page 6, the line above table 3, "they only dependent" is incorrect ============================================================================ REVIEWER #2 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Relevance: 5 Originality: 3 Technical correctness / soundness: 3 Readability and clarity: 4 Meaningful comparison: 4 Substance: 3 Impact of ideas: 2 Impact of resources: 1 Overall recommendation: 2 Reviewer Confidence: 4 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper presents a rather simplistic scheme for inferring whether a lexical entailment between a pair of words hold. The approach is based on a probabilistic formulation of word substitutability is contexts where languages models are used to estimate probabilities of candidate words occurring in given contexts. For each context with a blank place holder, substitute distributions are computed for words occurring in that context by computing the probability of each word in that context and then normalizing. The proposed method has been tested on four data sets and Average Precision scores for both entail and does-not-entail scores have been computed. With the exception of one data set, the proposed approach performs better that other competing approaches based on WordNet. Additional Comments: -- What is the point of the equations 1-4 in section 2 given that they are not used /referred to later? (2) seems to have a typesetting problem: should n in the denominator be actually the summation bound? -- In your derivation of the P(b|a) approximation, I can probably buy the first assumption of a and b being independent. I understand why you need the second assumption mathematically but is that a reasonable assumption? It could very well be but you really need to provide a justification argument. Also in the final equation are C and C' varying of the same set of contexts? -- In references, please make sure your conference names are consistently named and capitalized (e.g. the first two), also Srilm -> SRILM; journal names should all be capital initial (e.g. Turney et al. and you should put all the authors without et al) ============================================================================ REVIEWER #3 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Relevance: 5 Originality: 3 Technical correctness / soundness: 3 Readability and clarity: 3 Meaningful comparison: 1 Substance: 4 Impact of ideas: 3 Impact of resources: 1 Overall recommendation: 3 Reviewer Confidence: 5 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper proposes to address the lexical entailment problem by directly modeling lexical substitutability: the word "dog" is assumed to entail the word "animal" if "animal" can replace "dog" in the contexts where "dog" occurs. This idea is closely related to the context inclusion hypothesis (dog entails animal if dog occurs in a subset of contexts where animal occurs), which has been used to design asymmetric distributional similarity functions] to detect lexical entailment. This paper proposes instead to use n-gram language models to directly score how often "dog" is a good replacement for "animal" in contexts drawn from corpora. This is an interesting idea which is presented clearly and with some positive empirical results. However, several modeling and experimental choices should be explained and motivated more thoroughly. In Section (3), what are the consequences of the simplifying assumption that context probabilities P(C) are uniform? This should be discussed since some contexts are clearly more likely than others. Experiments could be strengthened, and include controlled comparison with other asymmetric unsupervised methods (such as the approaches introduced in related work) beyond the random and (symmetric?) similarity baselines (Table 2). Results published elsewhere are reported for comparison (Table 3) in an out-of-domain evaluation setting, but it would be useful to see a comparison with controlled training conditions. Selecting good contexts to test how often "dog" can be substituted by "animal" seems to be a crucial step in the approach introduced here. This raises the question of how sensitive is the approach to the nature, domain, amount, diversity of contexts? Relatedly, what was the motivation for extracting contexts from the Reuters RCV1 dataset and using distinct corpora for language modeling? Other comments: In Equation (2), $n$ should be on top of the sum in the denominator. In Section (4), what does the FASTSUBS algorithm do?