BibTeX

note = (in preparation) (10 entries) Select: All None Action: Show BibTeX

İlker Kesen, et al. 2023. CommonGen TBA. TBA, Dec. (in preparation). [ai.ku] google scholar

Cihan Baran and Deniz Yuret. 1900. Semantic parsing. TACL, December. (in preparation). [ai.ku] google scholar

Aydın Han. 1900. A domain specific Turkish to English statistical machine translation system. MS Thesis, February. Koç University. (in preparation). [ai.ku] google scholar

Volkan Cirik, Louis-Philippe Morency and Deniz Yuret. 1900. Context Vectors using Substitute Word Distributions. In ?. (in preparation). [ai.ku] annote google scholar

Title: Context Embeddings using Substitute Word Distributions Authors: Volkan Cirik, Louis-Philippe Morency and Deniz Yuret Instructions The author response period has begun. The reviews for your submission are displayed on this page. If you want to respond to the points raised in the reviews, you may do so in the box provided below. The response should be entered by 17 July, 2016 (11:59pm Pacific Daylight Savings Time, UTC -7h). Response can be edited multiple times during all the author response period. Please note: you are not obligated to respond to the reviews. Review #1 Appropriateness: 5 Clarity: 5 Originality: 3 Soundness / Correctness: 4 Meaningful Comparison: 3 Substance: 3 Impact of Ideas / Results: 3 Impact of Accompanying Software: 1 Impact of Accompanying Dataset / Resource: 1 Recommendation: 3 Reviewer Confidence: 5 Comments This work proposed to calculate context embeddings based on embeddings of substitute words. For a given context, a list of substitute words are firstly collected. Then, the weights of each substitute word is calculated based on a language model. Finally, the context embedding is computed by the weighed sum of all the substitute word embeddings. The author concatenate the context embedding with word embedding to represent a word under that context. Experiments on POS tagging task show the effectiveness of this method. The method is simple, and makes sense. However, I have some questions about the experiment part. (1) In Table 2, the number of features inputed to LIBSVM of the “word embedding” column is 50, whereas the number is 100 for the last column. Is it the reason that more features bring the higher performance? (2) Your word embeddings are pre-trained based on a much larger corpus. Is this the reason you got better performance in Table 4? (3) I’m wondering how will the result be if you only use the context embedding instead of concatenating it with the original word embedding. Review #2 Appropriateness: 5 Clarity: 4 Originality: 3 Soundness / Correctness: 5 Meaningful Comparison: 4 Substance: 4 Impact of Ideas / Results: 4 Impact of Accompanying Software: 1 Impact of Accompanying Dataset / Resource: 1 Recommendation: 4 Reviewer Confidence: 4 Comments The paper presents an extension of (Yatbaz et al., 2012), which introduced substitute vectors for word context representations, and elegantly define so called context embeddings. The context embeddings possess a few interesting properties - they are adaptable to different word embeddings methods being thus universal in the input data too, they are naturally extending any word embedding method itself and, most importantly, they are able to improve the results of word embeddings by allowing to differentiate word sense occurrences. The improvement is documented in the paper by increased POS tagging accuracy of three word embeddings models and by state-of-the-art results for 5 unsupervised POS induction tasks on several languages (the other 5 results were comparable to the state-of-the-art). Comments and questions: The context of size 2n+1 of the substitute vectors uses only n-grams for the computation - why? Word2vec uses full word contexts for word embeddings, shouldn't they be used here too instead of Markov estimates? The text uses both p() and P() for probability - is there a difference? If not, they should be unified. Table 4 shows higher values of "Our method" in two more cases (Bulgarian CoNLL-X and Turkish CoNLL-X), these are, however, not bold. why? The related works could be expanded with other recent results of "context embeddings" computations with similar "independence" qualities, e.g. - Instance-context embeddings - Kågebäck, Mikael, et al. "Neural context embeddings for automatic discovery of word senses." Proceedings of NAACL-HLT. 2015. - Vu, Thuy, and D. Stott Parker. "K-Embeddings: Learning Conceptual Embeddings for Words using Context." Proceedings of NAACL-HLT. 2016. However, the context embeddings method from the current paper can be regarded as more transparent. Review #3 Appropriateness: 5 Clarity: 4 Originality: 3 Soundness / Correctness: 4 Meaningful Comparison: 2 Substance: 3 Impact of Ideas / Results: 3 Impact of Accompanying Software: 1 Impact of Accompanying Dataset / Resource: 1 Recommendation: 3 Reviewer Confidence: 4 Comments The paper describes an approach to learn embeddings of words in context and uses these embeddings to tackle POS tagging problems. It combines the concept of substitute word distributions (i.e. probability with which other words can replace a word in context) with traditional word embeddings (e.g. Collobert & Weston 2011). The contextual word embedding is the concatenation of the original embedding and the weighted sum of the top-K substitute word embeddings where the weights are determined by a statistical 4-gram language model. On supervised POS tagging the system achieves an accuracy of 96.7 which is close to state-of-the-art. On unsupervised POS tagging the system beats the state-of-the-art on 5 languages. Overall, the combination of n-gram language model with word embeddings to determine contextual embeddings seems novel. Although the supervised tagging results are not very positive, the unsupervised tagging results seem promising. However, it is unclear how impactful unsupervised tagging results are when typical downstream applications require a much higher accuracy that can be achieved through supervised or semi-supervised techniques. The other issue I had with the paper was that the way the contextual embeddings were inferred was unsatisfying. Instead of learning such embeddings from first principles, the embeddings were just the weighted sum of other words that appear in a similar context. In fact, the authors seem unaware of recent work in learning a different embedding for each sense of a given word (e.g. Iacobacci et. al. work in EMNLP'15 and others). It would be very helpful to compare against such approaches. Finally, the authors' claim that the state-of-the-art supervised POS tagging results requires hand-engineered features is misleading. There are several papers that use word embeddings and neural networks without hand-engineering to achieve >97.2% accuracy on the WSJ corpus (e.g. see Collobert & Weston 2011 or dos Santos et. al. ICML'14).

Onur Kuru and Deniz Yuret. 1900. Recognizing Lexical Entailment using Substitutability. ?. (in preparation). [ai.ku] annote google scholar

============================================================================ COLING 2016 Reviews for Submission #378 ============================================================================ Title: Recognizing Lexical Entailment using Substitutability Authors: Onur Kuru and Deniz Yuret ============================================================================ REVIEWER #1 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Relevance: 5 Originality: 3 Technical correctness / soundness: 4 Readability and clarity: 4 Meaningful comparison: 5 Substance: 3 Impact of ideas: 3 Impact of resources: 1 Overall recommendation: 4 Reviewer Confidence: 5 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper proposes a new solution for lexical entailment that brings together substitutability with entailment. While substitutability has always been tied in with the definition of lexical entailment, the authors claim that this is the first attempt to directly model that aspect. They compare their (unsupervised) approach to other existing approach, showing a respectable performance across different datasets and settings. Overall, I like this paper. It is clear to read, and uses simple ideas, but at the same time it tries to approach the problem from a slightly different point of view than existing work, and shows that this novel approach does fairly well. While lexical entailment cannot always be modeled by substitution (for example, when the word pair has different POS tags), there is definitely some advantage to tying the two together - and this paper does that much more explicitly than prior work. I have some (mostly minor, except point 1) concerns with this work though: 1) The most major concern is a factual error in Table 3 - the numbers for balAPinc evaluated on KDSZ in the different setting are incorrect (the Turney and Mohammad paper reports 0.60 AP1 and 0.60 AP0). This possibly invalidates some of the claims made in this paper regarding the efficacy of balAPinc v/s Subs. 2) Some of the comparisons in this paper are not apples-to-apples since the authors in this work do not deal with multi-word expressions and hence they have to work with only a subset of some of the datasets (specifically the comparisons on the KDSZ and the Zeichner datasets). 3) Since this is an unsupervised setup, I would have liked a bit more detail on the experimental setup. Is it k-fold cross validation, or do you just use a small subset of the dataset to tune thresholds? 3) The error analysis is not very insightful. Instead of reporting performance across corpora/# of tokens/n in n-gram, I would have liked to see an error analysis that is specific to the approach, along the lines of the "Substitute Distributions" sections. For example, what is it that makes this model better? Where does it do better than balAPinc? Are the things that balAPinc can get that the Subs model cannot get right? 4) There are some grammatical issues and typos. For instance : - Page 1, Second paragraph, the sentence starting "Since lexical entailment.." is not grammatical - Page 6, the line above table 3, "they only dependent" is incorrect ============================================================================ REVIEWER #2 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Relevance: 5 Originality: 3 Technical correctness / soundness: 3 Readability and clarity: 4 Meaningful comparison: 4 Substance: 3 Impact of ideas: 2 Impact of resources: 1 Overall recommendation: 2 Reviewer Confidence: 4 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper presents a rather simplistic scheme for inferring whether a lexical entailment between a pair of words hold. The approach is based on a probabilistic formulation of word substitutability is contexts where languages models are used to estimate probabilities of candidate words occurring in given contexts. For each context with a blank place holder, substitute distributions are computed for words occurring in that context by computing the probability of each word in that context and then normalizing. The proposed method has been tested on four data sets and Average Precision scores for both entail and does-not-entail scores have been computed. With the exception of one data set, the proposed approach performs better that other competing approaches based on WordNet. Additional Comments: -- What is the point of the equations 1-4 in section 2 given that they are not used /referred to later? (2) seems to have a typesetting problem: should n in the denominator be actually the summation bound? -- In your derivation of the P(b|a) approximation, I can probably buy the first assumption of a and b being independent. I understand why you need the second assumption mathematically but is that a reasonable assumption? It could very well be but you really need to provide a justification argument. Also in the final equation are C and C' varying of the same set of contexts? -- In references, please make sure your conference names are consistently named and capitalized (e.g. the first two), also Srilm -> SRILM; journal names should all be capital initial (e.g. Turney et al. and you should put all the authors without et al) ============================================================================ REVIEWER #3 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Relevance: 5 Originality: 3 Technical correctness / soundness: 3 Readability and clarity: 3 Meaningful comparison: 1 Substance: 4 Impact of ideas: 3 Impact of resources: 1 Overall recommendation: 3 Reviewer Confidence: 5 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper proposes to address the lexical entailment problem by directly modeling lexical substitutability: the word "dog" is assumed to entail the word "animal" if "animal" can replace "dog" in the contexts where "dog" occurs. This idea is closely related to the context inclusion hypothesis (dog entails animal if dog occurs in a subset of contexts where animal occurs), which has been used to design asymmetric distributional similarity functions] to detect lexical entailment. This paper proposes instead to use n-gram language models to directly score how often "dog" is a good replacement for "animal" in contexts drawn from corpora. This is an interesting idea which is presented clearly and with some positive empirical results. However, several modeling and experimental choices should be explained and motivated more thoroughly. In Section (3), what are the consequences of the simplifying assumption that context probabilities P(C) are uniform? This should be discussed since some contexts are clearly more likely than others. Experiments could be strengthened, and include controlled comparison with other asymmetric unsupervised methods (such as the approaches introduced in related work) beyond the random and (symmetric?) similarity baselines (Table 2). Results published elsewhere are reported for comparison (Table 3) in an out-of-domain evaluation setting, but it would be useful to see a comparison with controlled training conditions. Selecting good contexts to test how often "dog" can be substituted by "animal" seems to be a crucial step in the approach introduced here. This raises the question of how sensitive is the approach to the nature, domain, amount, diversity of contexts? Relatedly, what was the motivation for extracting contexts from the Reuters RCV1 dataset and using distinct corpora for language modeling? Other comments: In Equation (2), $n$ should be on top of the sum in the denominator. In Section (4), what does the FASTSUBS algorithm do?

Deniz Yuret, Husnu Sensoy and Volkan Cirik. 1900. Parsing using context vectors. In ?. (in preparation). [ai.ku] google scholar

Emre Can Açıkgöz, Emirhan Kurtuluş and Deniz Yuret. 1023. Supervised Morphology. TBA, Dec. (in preparation). [ai.ku] google scholar

Mert Kayaalp, Alper T. Erdogan and Deniz Yuret. 1019. Neural Network Regularization Through a Feature Space Discrimination Cost Function. In NIPS, Vancouver, Dec. (in preparation). [ai.ku] google scholar

Alkan Kabakcioglu, Michael Hinczewski , Shishir Adhikari , Alexander Strang and Deniz Yuret. 1019. Nonequilibrium aspects of SGD dynamics in machine learning. TBD, Dec. (in preparation). [ai.ku] google scholar

Yonatan Bisk, Yannis Konstas, Ozan Arkan Can, Deniz Yuret and Daniel Marcu. 1018. Action Grounded Natural Language Generation. In ?. (in preparation). [ai.ku] google scholar