Onur Kuru and
Deniz Yuret.
1900.
Recognizing Lexical Entailment using Substitutability.
?.
(in preparation). [
ai.ku]
annote google scholar
============================================================================
COLING 2016 Reviews for Submission #378
============================================================================
Title: Recognizing Lexical Entailment using Substitutability
Authors: Onur Kuru and Deniz Yuret
============================================================================
REVIEWER #1
============================================================================
---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------
Relevance: 5
Originality: 3
Technical correctness / soundness: 4
Readability and clarity: 4
Meaningful comparison: 5
Substance: 3
Impact of ideas: 3
Impact of resources: 1
Overall recommendation: 4
Reviewer Confidence: 5
---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------
This paper proposes a new solution for lexical entailment that brings together
substitutability with entailment. While substitutability has always been tied
in with the definition of lexical entailment, the authors claim that this is
the first attempt to directly model that aspect. They compare their
(unsupervised) approach to other existing approach, showing a respectable
performance across different datasets and settings.
Overall, I like this paper. It is clear to read, and uses simple ideas, but at
the same time it tries to approach the problem from a slightly different point
of view than existing work, and shows that this novel approach does fairly
well. While lexical entailment cannot always be modeled by substitution (for
example, when the word pair has different POS tags), there is definitely some
advantage to tying the two together - and this paper does that much more
explicitly than prior work.
I have some (mostly minor, except point 1) concerns with this work though:
1) The most major concern is a factual error in Table 3 - the numbers for
balAPinc evaluated on KDSZ in the different setting are incorrect (the Turney
and Mohammad paper reports 0.60 AP1 and 0.60 AP0). This possibly invalidates
some of the claims made in this paper regarding the efficacy of balAPinc v/s
Subs.
2) Some of the comparisons in this paper are not apples-to-apples since the
authors in this work do not deal with multi-word expressions and hence they
have to work with only a subset of some of the datasets (specifically the
comparisons on the KDSZ and the Zeichner datasets).
3) Since this is an unsupervised setup, I would have liked a bit more detail on
the experimental setup. Is it k-fold cross validation, or do you just use a
small subset of the dataset to tune thresholds?
3) The error analysis is not very insightful. Instead of reporting performance
across corpora/# of tokens/n in n-gram, I would have liked to see an error
analysis that is specific to the approach, along the lines of the "Substitute
Distributions" sections. For example, what is it that makes this model better?
Where does it do better than balAPinc? Are the things that balAPinc can get
that the Subs model cannot get right?
4) There are some grammatical issues and typos. For instance :
- Page 1, Second paragraph, the sentence starting "Since lexical
entailment.." is not grammatical
- Page 6, the line above table 3, "they only dependent" is incorrect
============================================================================
REVIEWER #2
============================================================================
---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------
Relevance: 5
Originality: 3
Technical correctness / soundness: 3
Readability and clarity: 4
Meaningful comparison: 4
Substance: 3
Impact of ideas: 2
Impact of resources: 1
Overall recommendation: 2
Reviewer Confidence: 4
---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------
This paper presents a rather simplistic scheme for inferring whether a lexical
entailment between a pair of words hold. The approach is based on a
probabilistic formulation of word substitutability is contexts where languages
models are used to estimate probabilities of candidate words occurring in given
contexts. For each context with a blank place holder, substitute
distributions are computed for words occurring in that context by computing the
probability of each word in that context and then normalizing.
The proposed method has been tested on four data sets and Average Precision
scores for both entail and does-not-entail scores have been computed. With the
exception of one data set, the proposed approach performs better that other
competing approaches based on WordNet.
Additional Comments:
-- What is the point of the equations 1-4 in section 2 given that they are not
used /referred to later? (2) seems to have a typesetting problem: should n in
the denominator be actually the summation bound?
-- In your derivation of the P(b|a) approximation, I can probably buy the first
assumption of a and b being independent. I understand why you need the second
assumption mathematically but is that a reasonable assumption? It could very
well be but you really need to provide a justification argument. Also in the
final equation are C and C' varying of the same set of contexts?
-- In references, please make sure your conference names are consistently named
and capitalized (e.g. the first two), also Srilm -> SRILM; journal names should
all be capital initial (e.g. Turney et al. and you should put all the authors
without et al)
============================================================================
REVIEWER #3
============================================================================
---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------
Relevance: 5
Originality: 3
Technical correctness / soundness: 3
Readability and clarity: 3
Meaningful comparison: 1
Substance: 4
Impact of ideas: 3
Impact of resources: 1
Overall recommendation: 3
Reviewer Confidence: 5
---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------
This paper proposes to address the lexical entailment problem by
directly modeling lexical substitutability: the word "dog" is assumed
to entail the word "animal" if "animal" can replace "dog" in the
contexts where "dog" occurs. This idea is closely related to the
context inclusion hypothesis (dog entails animal if dog occurs in a
subset of contexts where animal occurs), which has been used to design
asymmetric distributional similarity functions] to detect lexical
entailment. This paper proposes instead to use n-gram language models
to directly score how often "dog" is a good replacement for "animal"
in contexts drawn from corpora.
This is an interesting idea which is presented clearly and with some
positive empirical results. However, several modeling and experimental
choices should be explained and motivated more thoroughly.
In Section (3), what are the consequences of the simplifying
assumption that context probabilities P(C) are uniform? This should be
discussed since some contexts are clearly more likely than others.
Experiments could be strengthened, and include controlled comparison
with other asymmetric unsupervised methods (such as the approaches
introduced in related work) beyond the random and (symmetric?)
similarity baselines (Table 2). Results published elsewhere are
reported for comparison (Table 3) in an out-of-domain evaluation
setting, but it would be useful to see a comparison with controlled
training conditions.
Selecting good contexts to test how often "dog" can be substituted by
"animal" seems to be a crucial step in the approach introduced
here. This raises the question of how sensitive is the approach to the
nature, domain, amount, diversity of contexts? Relatedly, what was the
motivation for extracting contexts from the Reuters RCV1 dataset and
using distinct corpora for language modeling?
Other comments:
In Equation (2), $n$ should be on top of the sum in the denominator.
In Section (4), what does the FASTSUBS algorithm do?