Onur Kuru,
Ozan Arkan Can and
Deniz Yuret.
2016.
CharNER: Character-Level Named Entity Recognition. In
COLING,
December. [
ai.ku]
pdf pdf annote google scholar
COLING 2016 review:
Title: CharNER: Character-Level Named Entity Recognition
Authors: Onur Kuru, Ozan Arkan Can and Deniz Yuret
============================================================================
REVIEWER #1
============================================================================
---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------
Relevance: 5
Originality: 3
Technical correctness / soundness: 3
Readability and clarity: 4
Meaningful comparison: 4
Substance: 3
Impact of ideas: 3
Impact of resources: 3
Overall recommendation: 4
Reviewer Confidence: 4
---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------
The paper proposed character based NER for languages with word segmentation.
The character-based tagging is proposed previously. Their contribution is to
include LSTM models for the character-based tagging settings.
However, the result is fair for the targeted languages.
In my opinion, the method may be promising for the languages without word
segmentation such as Chinese and Japanese, since the word segmentation error
affects the score of NER.
============================================================================
REVIEWER #2
============================================================================
---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------
Relevance: 5
Originality: 3
Technical correctness / soundness: 4
Readability and clarity: 4
Meaningful comparison: 4
Substance: 3
Impact of ideas: 3
Impact of resources: 1
Overall recommendation: 4
Reviewer Confidence: 5
---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------
This paper presents a character-based model for named-entity recognition based
on bidirectional LSTMs. There is very recent research that tries to do
something similar, in NAACL-16 (Lample et al.) and ACL (Ma and Hovy, for
example) with the exact same motivation: remove external resources, such as
gazetteers and use a character-based approach to achieve high results. This
should not invalidate the paper, though.
The main difference with previous research (mentioned above) is that this model
examines a sentence as a sequence of characters and outputs a tag distribution
for each character. They later use transition matrices that only allow tags
consistent with the word.
The results are nice, however they are not the best at all. They are very good
compared to systems that do not use external resources including word
embeddings, however it should be a requirement to report results provided by
the other systems without external resources (see Lample et al. for example)
In Table 6, you present results for Ma and Hovy and Lample et al. and you
include them in the "External" row, as far as I know they only use word
embeddings (if they do)... I think that you should incorporate "word
embeddings" to the caption, otherwise readers might think that they use
gazetteers.
Some missing references: Two EMNLP-15 papers that presented interesting results
for tagging, parsing and language modeling by using character-based embeddings.
- Wang Ling; Chris Dyer; Alan W Black; Isabel Trancoso; Ramon Fermandez; Silvio
Amir; Luis Marujo; Tiago Luis
Finding Function in Form: Compositional Character Models for Open Vocabulary
Word Representation
-Miguel Ballesteros; Chris Dyer; Noah A. Smith
Improved Transition-based Parsing by Modeling Characters instead of Words with
LSTMs
This paper is also worth mentioning, since it also produces an entire character
sequence for sentences: Bhuwan Dhingra; Zhong Zhou; Dylan Fitzpatrick; Michael
Muehl; William Cohen Tweet2Vec: Character-Based Distributed Representations for
Social Media
Minor comment:
Lample et al. do some more than LSTM-CRF, they also presented a shift reduce
algorithm that exploits character-based embeddings.
============================================================================
REVIEWER #3
============================================================================
---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------
Relevance: 5
Originality: 3
Technical correctness / soundness: 4
Readability and clarity: 5
Meaningful comparison: 4
Substance: 4
Impact of ideas: 4
Impact of resources: 1
Overall recommendation: 4
Reviewer Confidence: 3
---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------
Very interesting work.
It clearly shows that a deep bidirectional LSTM architecture combined with a
Viterbi decoder effectively finds language specific features for NER.
Applying the algorithm to Languages written without space characters, e.g.,
Chinese and/or Japanese, may be interesting.
=================================
EMNLP 2016 review:
Title: CharNER: Character-Level Named Entity Recognition
Authors: Onur Kuru, Ozan Arkan Can and Deniz Yuret
Instructions
The author response period has begun. The reviews for your submission are displayed on this page. If you want to respond to the points raised in the reviews, you may do so in the box provided below.
The response should be entered by 17 July, 2016 (11:59pm Pacific Daylight Savings Time, UTC -7h).
Response can be edited multiple times during all the author response period.
Please note: you are not obligated to respond to the reviews.
Review #1
Appropriateness: 5
Clarity: 4
Originality: 3
Soundness / Correctness: 4
Meaningful Comparison: 5
Substance: 4
Impact of Ideas / Results: 4
Impact of Accompanying Software: 3
Impact of Accompanying Dataset / Resource: 1
Recommendation: 3
Reviewer Confidence: 5
Comments
This paper presents a named entity recognizer in which the entire sentence is encoded as a sequence of characters, and a bidirectional LSTM is used to make predictions. Unlike previous (and recent) approaches, such as Lample et al. 2016 that presented character-based representation of words and then an LSTM/bidirectionalLSTM/stack-LSTM on top of that. This model is similar to the tweet2vec model recently accepted at ACL2016, even though it tries to solve a different task. They examine a sentence as a sequence of characters and outputs a tag distribution for each character.This model,as Lample et al., has the potentialities of being language independent and they apply it cross-lingually. The motivation and goals are also similar to Lample et al. (remove external features such as gazzetteers etc, and still achieve high results)
Figure 3 does a great job summarizing the entire paper.
In order to avoid things like J o h n w o r k s P O O O G G G G O they use a decoder as Wang et al. 2015, that applies a transition matrix, at the end they output the entire sequence.
They make a good comparison with related work, but since some of the models are freely available, I'd expect that the authors run them in the languages without public results (such as Arabic or Turkish)
In table 5 you should definitely differentiate between systems that use gazzetteers, and neural models that use pretrained word embeddings. They are not the same thing and how it is presented it might confuse the reader.
This is an interesting paper, but it lacks a bit of novelty given all the previous work that already demonstrated the usefulness of characters and sequential models for NER.
Minor comments: Missing ref (?) in related work.
Onur Kuru and
Deniz Yuret.
1900.
Recognizing Lexical Entailment using Substitutability.
?.
(in preparation). [
ai.ku]
annote google scholar
============================================================================
COLING 2016 Reviews for Submission #378
============================================================================
Title: Recognizing Lexical Entailment using Substitutability
Authors: Onur Kuru and Deniz Yuret
============================================================================
REVIEWER #1
============================================================================
---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------
Relevance: 5
Originality: 3
Technical correctness / soundness: 4
Readability and clarity: 4
Meaningful comparison: 5
Substance: 3
Impact of ideas: 3
Impact of resources: 1
Overall recommendation: 4
Reviewer Confidence: 5
---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------
This paper proposes a new solution for lexical entailment that brings together
substitutability with entailment. While substitutability has always been tied
in with the definition of lexical entailment, the authors claim that this is
the first attempt to directly model that aspect. They compare their
(unsupervised) approach to other existing approach, showing a respectable
performance across different datasets and settings.
Overall, I like this paper. It is clear to read, and uses simple ideas, but at
the same time it tries to approach the problem from a slightly different point
of view than existing work, and shows that this novel approach does fairly
well. While lexical entailment cannot always be modeled by substitution (for
example, when the word pair has different POS tags), there is definitely some
advantage to tying the two together - and this paper does that much more
explicitly than prior work.
I have some (mostly minor, except point 1) concerns with this work though:
1) The most major concern is a factual error in Table 3 - the numbers for
balAPinc evaluated on KDSZ in the different setting are incorrect (the Turney
and Mohammad paper reports 0.60 AP1 and 0.60 AP0). This possibly invalidates
some of the claims made in this paper regarding the efficacy of balAPinc v/s
Subs.
2) Some of the comparisons in this paper are not apples-to-apples since the
authors in this work do not deal with multi-word expressions and hence they
have to work with only a subset of some of the datasets (specifically the
comparisons on the KDSZ and the Zeichner datasets).
3) Since this is an unsupervised setup, I would have liked a bit more detail on
the experimental setup. Is it k-fold cross validation, or do you just use a
small subset of the dataset to tune thresholds?
3) The error analysis is not very insightful. Instead of reporting performance
across corpora/# of tokens/n in n-gram, I would have liked to see an error
analysis that is specific to the approach, along the lines of the "Substitute
Distributions" sections. For example, what is it that makes this model better?
Where does it do better than balAPinc? Are the things that balAPinc can get
that the Subs model cannot get right?
4) There are some grammatical issues and typos. For instance :
- Page 1, Second paragraph, the sentence starting "Since lexical
entailment.." is not grammatical
- Page 6, the line above table 3, "they only dependent" is incorrect
============================================================================
REVIEWER #2
============================================================================
---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------
Relevance: 5
Originality: 3
Technical correctness / soundness: 3
Readability and clarity: 4
Meaningful comparison: 4
Substance: 3
Impact of ideas: 2
Impact of resources: 1
Overall recommendation: 2
Reviewer Confidence: 4
---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------
This paper presents a rather simplistic scheme for inferring whether a lexical
entailment between a pair of words hold. The approach is based on a
probabilistic formulation of word substitutability is contexts where languages
models are used to estimate probabilities of candidate words occurring in given
contexts. For each context with a blank place holder, substitute
distributions are computed for words occurring in that context by computing the
probability of each word in that context and then normalizing.
The proposed method has been tested on four data sets and Average Precision
scores for both entail and does-not-entail scores have been computed. With the
exception of one data set, the proposed approach performs better that other
competing approaches based on WordNet.
Additional Comments:
-- What is the point of the equations 1-4 in section 2 given that they are not
used /referred to later? (2) seems to have a typesetting problem: should n in
the denominator be actually the summation bound?
-- In your derivation of the P(b|a) approximation, I can probably buy the first
assumption of a and b being independent. I understand why you need the second
assumption mathematically but is that a reasonable assumption? It could very
well be but you really need to provide a justification argument. Also in the
final equation are C and C' varying of the same set of contexts?
-- In references, please make sure your conference names are consistently named
and capitalized (e.g. the first two), also Srilm -> SRILM; journal names should
all be capital initial (e.g. Turney et al. and you should put all the authors
without et al)
============================================================================
REVIEWER #3
============================================================================
---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------
Relevance: 5
Originality: 3
Technical correctness / soundness: 3
Readability and clarity: 3
Meaningful comparison: 1
Substance: 4
Impact of ideas: 3
Impact of resources: 1
Overall recommendation: 3
Reviewer Confidence: 5
---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------
This paper proposes to address the lexical entailment problem by
directly modeling lexical substitutability: the word "dog" is assumed
to entail the word "animal" if "animal" can replace "dog" in the
contexts where "dog" occurs. This idea is closely related to the
context inclusion hypothesis (dog entails animal if dog occurs in a
subset of contexts where animal occurs), which has been used to design
asymmetric distributional similarity functions] to detect lexical
entailment. This paper proposes instead to use n-gram language models
to directly score how often "dog" is a good replacement for "animal"
in contexts drawn from corpora.
This is an interesting idea which is presented clearly and with some
positive empirical results. However, several modeling and experimental
choices should be explained and motivated more thoroughly.
In Section (3), what are the consequences of the simplifying
assumption that context probabilities P(C) are uniform? This should be
discussed since some contexts are clearly more likely than others.
Experiments could be strengthened, and include controlled comparison
with other asymmetric unsupervised methods (such as the approaches
introduced in related work) beyond the random and (symmetric?)
similarity baselines (Table 2). Results published elsewhere are
reported for comparison (Table 3) in an out-of-domain evaluation
setting, but it would be useful to see a comparison with controlled
training conditions.
Selecting good contexts to test how often "dog" can be substituted by
"animal" seems to be a crucial step in the approach introduced
here. This raises the question of how sensitive is the approach to the
nature, domain, amount, diversity of contexts? Relatedly, what was the
motivation for extracting contexts from the Reuters RCV1 dataset and
using distinct corpora for language modeling?
Other comments:
In Equation (2), $n$ should be on top of the sum in the denominator.
In Section (4), what does the FASTSUBS algorithm do?