BibTeX

author = Kuru, Onur (3 entries) Select: All None Action: Show BibTeX

Onur Kuru, Ozan Arkan Can and Deniz Yuret. 2016. CharNER: Character-Level Named Entity Recognition. In COLING, December. [ai.ku] pdf pdf annote google scholar

COLING 2016 review: Title: CharNER: Character-Level Named Entity Recognition Authors: Onur Kuru, Ozan Arkan Can and Deniz Yuret ============================================================================ REVIEWER #1 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Relevance: 5 Originality: 3 Technical correctness / soundness: 3 Readability and clarity: 4 Meaningful comparison: 4 Substance: 3 Impact of ideas: 3 Impact of resources: 3 Overall recommendation: 4 Reviewer Confidence: 4 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- The paper proposed character based NER for languages with word segmentation. The character-based tagging is proposed previously. Their contribution is to include LSTM models for the character-based tagging settings. However, the result is fair for the targeted languages. In my opinion, the method may be promising for the languages without word segmentation such as Chinese and Japanese, since the word segmentation error affects the score of NER. ============================================================================ REVIEWER #2 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Relevance: 5 Originality: 3 Technical correctness / soundness: 4 Readability and clarity: 4 Meaningful comparison: 4 Substance: 3 Impact of ideas: 3 Impact of resources: 1 Overall recommendation: 4 Reviewer Confidence: 5 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper presents a character-based model for named-entity recognition based on bidirectional LSTMs. There is very recent research that tries to do something similar, in NAACL-16 (Lample et al.) and ACL (Ma and Hovy, for example) with the exact same motivation: remove external resources, such as gazetteers and use a character-based approach to achieve high results. This should not invalidate the paper, though. The main difference with previous research (mentioned above) is that this model examines a sentence as a sequence of characters and outputs a tag distribution for each character. They later use transition matrices that only allow tags consistent with the word. The results are nice, however they are not the best at all. They are very good compared to systems that do not use external resources including word embeddings, however it should be a requirement to report results provided by the other systems without external resources (see Lample et al. for example) In Table 6, you present results for Ma and Hovy and Lample et al. and you include them in the "External" row, as far as I know they only use word embeddings (if they do)... I think that you should incorporate "word embeddings" to the caption, otherwise readers might think that they use gazetteers. Some missing references: Two EMNLP-15 papers that presented interesting results for tagging, parsing and language modeling by using character-based embeddings. - Wang Ling; Chris Dyer; Alan W Black; Isabel Trancoso; Ramon Fermandez; Silvio Amir; Luis Marujo; Tiago Luis Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation -Miguel Ballesteros; Chris Dyer; Noah A. Smith Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs This paper is also worth mentioning, since it also produces an entire character sequence for sentences: Bhuwan Dhingra; Zhong Zhou; Dylan Fitzpatrick; Michael Muehl; William Cohen Tweet2Vec: Character-Based Distributed Representations for Social Media Minor comment: Lample et al. do some more than LSTM-CRF, they also presented a shift reduce algorithm that exploits character-based embeddings. ============================================================================ REVIEWER #3 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Relevance: 5 Originality: 3 Technical correctness / soundness: 4 Readability and clarity: 5 Meaningful comparison: 4 Substance: 4 Impact of ideas: 4 Impact of resources: 1 Overall recommendation: 4 Reviewer Confidence: 3 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- Very interesting work. It clearly shows that a deep bidirectional LSTM architecture combined with a Viterbi decoder effectively finds language specific features for NER. Applying the algorithm to Languages written without space characters, e.g., Chinese and/or Japanese, may be interesting. ================================= EMNLP 2016 review: Title: CharNER: Character-Level Named Entity Recognition Authors: Onur Kuru, Ozan Arkan Can and Deniz Yuret Instructions The author response period has begun. The reviews for your submission are displayed on this page. If you want to respond to the points raised in the reviews, you may do so in the box provided below. The response should be entered by 17 July, 2016 (11:59pm Pacific Daylight Savings Time, UTC -7h). Response can be edited multiple times during all the author response period. Please note: you are not obligated to respond to the reviews. Review #1 Appropriateness: 5 Clarity: 4 Originality: 3 Soundness / Correctness: 4 Meaningful Comparison: 5 Substance: 4 Impact of Ideas / Results: 4 Impact of Accompanying Software: 3 Impact of Accompanying Dataset / Resource: 1 Recommendation: 3 Reviewer Confidence: 5 Comments This paper presents a named entity recognizer in which the entire sentence is encoded as a sequence of characters, and a bidirectional LSTM is used to make predictions. Unlike previous (and recent) approaches, such as Lample et al. 2016 that presented character-based representation of words and then an LSTM/bidirectionalLSTM/stack-LSTM on top of that. This model is similar to the tweet2vec model recently accepted at ACL2016, even though it tries to solve a different task. They examine a sentence as a sequence of characters and outputs a tag distribution for each character.This model,as Lample et al., has the potentialities of being language independent and they apply it cross-lingually. The motivation and goals are also similar to Lample et al. (remove external features such as gazzetteers etc, and still achieve high results) Figure 3 does a great job summarizing the entire paper. In order to avoid things like J o h n w o r k s P O O O G G G G O they use a decoder as Wang et al. 2015, that applies a transition matrix, at the end they output the entire sequence. They make a good comparison with related work, but since some of the models are freely available, I'd expect that the authors run them in the languages without public results (such as Arabic or Turkish) In table 5 you should definitely differentiate between systems that use gazzetteers, and neural models that use pretrained word embeddings. They are not the same thing and how it is presented it might confuse the reader. This is an interesting paper, but it lacks a bit of novelty given all the previous work that already demonstrated the usefulness of characters and sequential models for NER. Minor comments: Missing ref (?) in related work.

Onur Kuru. 2016. AI-KU at SemEval-2016 Task 11: Word Embeddings and Substring Features for Complex Word Identification. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp 967--971, San Diego, California, June. Association for Computational Linguistics. [ai.ku] pdf google scholar

Onur Kuru and Deniz Yuret. 1900. Recognizing Lexical Entailment using Substitutability. ?. (in preparation). [ai.ku] annote google scholar

============================================================================ COLING 2016 Reviews for Submission #378 ============================================================================ Title: Recognizing Lexical Entailment using Substitutability Authors: Onur Kuru and Deniz Yuret ============================================================================ REVIEWER #1 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Relevance: 5 Originality: 3 Technical correctness / soundness: 4 Readability and clarity: 4 Meaningful comparison: 5 Substance: 3 Impact of ideas: 3 Impact of resources: 1 Overall recommendation: 4 Reviewer Confidence: 5 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper proposes a new solution for lexical entailment that brings together substitutability with entailment. While substitutability has always been tied in with the definition of lexical entailment, the authors claim that this is the first attempt to directly model that aspect. They compare their (unsupervised) approach to other existing approach, showing a respectable performance across different datasets and settings. Overall, I like this paper. It is clear to read, and uses simple ideas, but at the same time it tries to approach the problem from a slightly different point of view than existing work, and shows that this novel approach does fairly well. While lexical entailment cannot always be modeled by substitution (for example, when the word pair has different POS tags), there is definitely some advantage to tying the two together - and this paper does that much more explicitly than prior work. I have some (mostly minor, except point 1) concerns with this work though: 1) The most major concern is a factual error in Table 3 - the numbers for balAPinc evaluated on KDSZ in the different setting are incorrect (the Turney and Mohammad paper reports 0.60 AP1 and 0.60 AP0). This possibly invalidates some of the claims made in this paper regarding the efficacy of balAPinc v/s Subs. 2) Some of the comparisons in this paper are not apples-to-apples since the authors in this work do not deal with multi-word expressions and hence they have to work with only a subset of some of the datasets (specifically the comparisons on the KDSZ and the Zeichner datasets). 3) Since this is an unsupervised setup, I would have liked a bit more detail on the experimental setup. Is it k-fold cross validation, or do you just use a small subset of the dataset to tune thresholds? 3) The error analysis is not very insightful. Instead of reporting performance across corpora/# of tokens/n in n-gram, I would have liked to see an error analysis that is specific to the approach, along the lines of the "Substitute Distributions" sections. For example, what is it that makes this model better? Where does it do better than balAPinc? Are the things that balAPinc can get that the Subs model cannot get right? 4) There are some grammatical issues and typos. For instance : - Page 1, Second paragraph, the sentence starting "Since lexical entailment.." is not grammatical - Page 6, the line above table 3, "they only dependent" is incorrect ============================================================================ REVIEWER #2 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Relevance: 5 Originality: 3 Technical correctness / soundness: 3 Readability and clarity: 4 Meaningful comparison: 4 Substance: 3 Impact of ideas: 2 Impact of resources: 1 Overall recommendation: 2 Reviewer Confidence: 4 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper presents a rather simplistic scheme for inferring whether a lexical entailment between a pair of words hold. The approach is based on a probabilistic formulation of word substitutability is contexts where languages models are used to estimate probabilities of candidate words occurring in given contexts. For each context with a blank place holder, substitute distributions are computed for words occurring in that context by computing the probability of each word in that context and then normalizing. The proposed method has been tested on four data sets and Average Precision scores for both entail and does-not-entail scores have been computed. With the exception of one data set, the proposed approach performs better that other competing approaches based on WordNet. Additional Comments: -- What is the point of the equations 1-4 in section 2 given that they are not used /referred to later? (2) seems to have a typesetting problem: should n in the denominator be actually the summation bound? -- In your derivation of the P(b|a) approximation, I can probably buy the first assumption of a and b being independent. I understand why you need the second assumption mathematically but is that a reasonable assumption? It could very well be but you really need to provide a justification argument. Also in the final equation are C and C' varying of the same set of contexts? -- In references, please make sure your conference names are consistently named and capitalized (e.g. the first two), also Srilm -> SRILM; journal names should all be capital initial (e.g. Turney et al. and you should put all the authors without et al) ============================================================================ REVIEWER #3 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Relevance: 5 Originality: 3 Technical correctness / soundness: 3 Readability and clarity: 3 Meaningful comparison: 1 Substance: 4 Impact of ideas: 3 Impact of resources: 1 Overall recommendation: 3 Reviewer Confidence: 5 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper proposes to address the lexical entailment problem by directly modeling lexical substitutability: the word "dog" is assumed to entail the word "animal" if "animal" can replace "dog" in the contexts where "dog" occurs. This idea is closely related to the context inclusion hypothesis (dog entails animal if dog occurs in a subset of contexts where animal occurs), which has been used to design asymmetric distributional similarity functions] to detect lexical entailment. This paper proposes instead to use n-gram language models to directly score how often "dog" is a good replacement for "animal" in contexts drawn from corpora. This is an interesting idea which is presented clearly and with some positive empirical results. However, several modeling and experimental choices should be explained and motivated more thoroughly. In Section (3), what are the consequences of the simplifying assumption that context probabilities P(C) are uniform? This should be discussed since some contexts are clearly more likely than others. Experiments could be strengthened, and include controlled comparison with other asymmetric unsupervised methods (such as the approaches introduced in related work) beyond the random and (symmetric?) similarity baselines (Table 2). Results published elsewhere are reported for comparison (Table 3) in an out-of-domain evaluation setting, but it would be useful to see a comparison with controlled training conditions. Selecting good contexts to test how often "dog" can be substituted by "animal" seems to be a crucial step in the approach introduced here. This raises the question of how sensitive is the approach to the nature, domain, amount, diversity of contexts? Relatedly, what was the motivation for extracting contexts from the Reuters RCV1 dataset and using distinct corpora for language modeling? Other comments: In Equation (2), $n$ should be on top of the sum in the denominator. In Section (4), what does the FASTSUBS algorithm do?