Müge Kural and
Deniz Yuret.
2024.
Unsupervised Learning of Turkish Morphology with Multiple Codebook VQ-VAE. In
Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024), pp
1--17,
Bangkok, Thailand and Online,
Aug.
Association for Computational Linguistics. [
ai.ku]
url url abstract google scholar
This paper presents an interpretable unsupervised morphological learning model, showing comparable performance to supervised models in learning complex morphological rules of Turkish as evidenced by its application to the problem of morphological inflection within the SIGMORPHON Shared Tasks. The significance of our unsupervised approach lies in its alignment with how humans naturally acquire rules from raw data without supervision. To achieve this, we construct a model with multiple codebooks of VQ-VAE employing continuous and discrete latent variables during word generation. We evaluate the model{'}s performance under high and low-resource scenarios, and use probing techniques to examine encoded information in latent representations. We also evaluate its generalization capabilities by testing unseen suffixation scenarios within the SIGMORPHON-UniMorph 2022 Shared Task 0. Our results demonstrate our model{'}s ability to distinguish word structures into lemmas and suffixes, with each codebook specialized for different morphological features, contributing to the interpretability of our model and effectively performing morphological inflection on both seen and unseen morphological features.
Ali Hürriyetoğlu,
Hristo Tanev,
Vanni Zavarella,
Jakub Piskorski,
Reyyan Yeniterzi,
Deniz Yuret and
Aline Villavicencio.
2021.
Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021): Workshop and Shared Task Report. In
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021), pp
1--9,
Online,
Aug.
Association for Computational Linguistics. [
ai.ku]
url abstract google scholar
This workshop is the fourth issue of a series of workshops on automatic extraction of socio-political events from news, organized by the Emerging Market Welfare Project, with the support of the Joint Research Centre of the European Commission and with contributions from many other prominent scholars in this field. The purpose of this series of workshops is to foster research and development of reliable, valid, robust, and practical solutions for automatically detecting descriptions of socio-political events, such as protests, riots, wars and armed conflicts, in text streams. This year workshop contributors make use of the state-of-the-art NLP technologies, such as Deep Learning, Word Embeddings and Transformers and cover a wide range of topics from text classification to news bias detection. Around 40 teams have registered and 15 teams contributed to three tasks that are i) multilingual protest news detection detection, ii) fine-grained classification of socio-political events, and iii) discovering Black Lives Matter protest events. The workshop also highlights two keynote and four invited talks about various aspects of creating event data sets and multi- and cross-lingual machine learning in few- and zero-shot settings.
Cemil Cengiz,
Ulaş Sert and
Deniz Yuret.
2019.
KU_ai at MEDIQA 2019: Domain-specific Pre-training and Transfer Learning for Medical NLI. In
Proceedings of the 18th BioNLP Workshop and Shared Task, pp
427--436,
Florence, Italy,
Aug.
Association for Computational Linguistics. [
ai.ku]
url abstract google scholar
In this paper, we describe our system and results submitted for the Natural Language Inference (NLI) track of the MEDIQA 2019 Shared Task. As KU{\_}ai team, we used BERT as our baseline model and pre-processed the MedNLI dataset to mitigate the negative impact of de-identification artifacts. Moreover, we investigated different pre-training and transfer learning approaches to improve the performance. We show that pre-training the language model on rich biomedical corpora has a significant effect in teaching the model domain-specific language. In addition, training the model on large NLI datasets such as MultiNLI and SNLI helps in learning task-specific reasoning. Finally, we ensembled our highest-performing models, and achieved 84.7{\%} accuracy on the unseen test dataset and ranked 10th out of 17 teams in the official results.
Berkay Önder,
Can Gümeli and
Deniz Yuret.
2018.
SParse: Koç University Graph-Based Parsing System for the CoNLL 2018 Shared Task. In
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp
216--222,
Brussels, Belgium,
October.
Association for Computational Linguistics. [
ai.ku]
url abstract google scholar
We present SParse, our Graph-Based Parsing model submitted for the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (Zeman et al., 2018). Our model extends the state-of-the-art biaffine parser (Dozat and Manning, 2016) with a structural meta-learning module, SMeta, that combines local and global label predictions. Our parser has been trained and run on Universal Dependencies datasets (Nivre et al., 2016, 2018) and has 87.48% LAS, 78.63% MLAS, 78.69% BLEX and 81.76% CLAS (Nivre and Fang, 2017) score on the Italian-ISDT dataset and has 72.78% LAS, 59.10% MLAS, 61.38% BLEX and 61.72% CLAS score on the Japanese-GSD dataset in our official submission. All other corpora are evaluated after the submission deadline, for whom we present our unofficial test results.
Ömer Kırnap,
Erenay Dayanık and
Deniz Yuret.
2018.
Tree-Stack LSTM in Transition Based Dependency Parsing. In
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp
124--132,
Brussels, Belgium,
October.
Association for Computational Linguistics. [
ai.ku]
url abstract google scholar
We introduce tree-stack LSTM to model state of a transition based parser with recurrent neural networks. Tree-stack LSTM does not use any parse tree based or hand-crafted features, yet performs better than models with these features. We also develop new set of embeddings from raw features to enhance the performance. There are 4 main components of this model: stack’s σ-LSTM, buffer’s β-LSTM, actions’ LSTM and tree-RNN. All LSTMs use continuous dense feature vectors (embeddings) as an input. Tree-RNN updates these embeddings based on transitions. We show that our model improves performance with low resource languages compared with its predecessors. We participate in CoNLL 2018 UD Shared Task as the ”KParse” team and ranked 16th in LAS, 15th in BLAS and BLEX metrics, of 27 participants parsing 82 test sets from 57 languages.
Barret Zoph,
Deniz Yuret,
Jon May and
Kevin Knight.
2016.
Transfer Learning for Low-Resource Neural Machine Translation. In
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp
1568--1575,
Austin, Texas,
November.
Association for Computational Linguistics. [
ai.ku]
url url url annote google scholar
Title: Transfer Learning for Low-Resource Neural Machine Translation
Authors: Barret Zoph, Deniz Yuret, Jonathan May and Kevin Knight
Instructions
The author response period has begun. The reviews for your submission are displayed on this page. If you want to respond to the points raised in the reviews, you may do so in the box provided below.
The response should be entered by 17 July, 2016 (11:59pm Pacific Daylight Savings Time, UTC -7h).
Response can be edited multiple times during all the author response period.
Please note: you are not obligated to respond to the reviews.
Review #1
Appropriateness: 5
Clarity: 4
Originality: 4
Soundness / Correctness: 4
Meaningful Comparison: 4
Substance: 4
Impact of Ideas / Results: 4
Impact of Accompanying Software: 1
Impact of Accompanying Dataset / Resource: 1
Recommendation: 4
Reviewer Confidence: 4
Comments
Neural machine translation (NMT) performs much worse than phrase-based or syntax-based statistical machine translation (SMT) for low-resource language pairs. This paper adopts transfer learning to improve the low-resource NMT. An NMT model is first trained on a rich-resource language pair, then transfer some learned parameters to the low-resource NMT model. In this way, the low-resource NMT obtained comparable performance as SMT.
Although the idea is simple, experiments on four low-resource language pairs proved its efficiency. Moreover, a detailed and nice analysis is included in the paper, discussed the effect of the different rich-resource languages, fixing different part of the model, learning curve, etc.
The section about key idea (section 3) lacks a bit of clarity, I understand the whole idea after read section 4 and 5, maybe you need to make it clearer what you exactly do transfer learning for NMT in section 3. A small question is related with this, do you use the same learning rate to train the parent model and child model? If not, how do you set the learning rate?
Review #2
Appropriateness: 5
Clarity: 5
Originality: 3
Soundness / Correctness: 4
Meaningful Comparison: 4
Substance: 4
Impact of Ideas / Results: 3
Impact of Accompanying Software: 1
Impact of Accompanying Dataset / Resource: 1
Recommendation: 3
Reviewer Confidence: 4
Comments
The paper presents a transfer learning approach for low-resource NMT. The idea is extremely simple: training a full-scale parent model is used to initialise a child model with limited resources. The approach seems to be surprisingly effective and the authors convincingly present empirical proof for the use of this method. The final model is still below the performance of the baseline syntax-based SMT model in three out of four language pairs but outperforms the baseline when used for restoring n-bet lists.
It seems surprising that the randomised mapping of word embeddings to the low-resource language word types actually works well enough. It would be interesting to know how much they are changed during training and how much they differ from training without parent model (if that can be quantified in some reasonable way). In the analyses part, the authors present a est in which they improve the mapping with dictionary-based assignment which sounds much more intuitive. But the use of this initialisation is minimal and seems to fade out completely in later training epochs. I'm still puzzled why there shouldn't be any effect of more appropriate mappings. Is there a problem with the dictionary-based assignments?
Another question is whether this approach would also work the other way around. Could you fix the source language embeddings when translating from English into low-resource languages and would you expect the same kind of improvements? Did you already try this and would you have anything to report in the reverse direction? This would be more interesting then some of the ablation tests (like 5.2 which seems to argue for the same as 5.1) In any case, the discussion about the effect of parent language pair is limited as this does not systematically evaluate the effect using many language pairs and language families with various kinds of relationships.
I have also a question about the syntax-based model used as baseline. What is the target syntax you are using in your string-to-tree model and how does it compare to state-of-the art settings? 26 BLEU for the English-Frech system sounds low to me but what is the test set and how does it compare to state-of-the-art?
Another question would be about evaluations beyond BLEU scores. What does happen to the actual translations and what do humans think about them? The limits of automatic evaluation metrics are well know and the authors should discuss translation quality in other terms as well.
Review #3
Appropriateness: 5
Clarity: 5
Originality: 3
Soundness / Correctness: 4
Meaningful Comparison: 5
Substance: 4
Impact of Ideas / Results: 4
Impact of Accompanying Software: 1
Impact of Accompanying Dataset / Resource: 1
Recommendation: 4
Reviewer Confidence: 3
Comments
The paper presents a method for transferring parameters from an NMT model learned on a large data set (fr-en) to a low resource translation task with the same output language (ha-en, tr-en, uz-en and ur-en). Part of the transfer process is selecting which parameters to fine-tune to the low resource translation task.
Although I'm not convinced that the presented method is the best way to leverage the larger data set in a low resource translation setting, I strongly believe that this method needs to be presented to the community.
In the prose you say that French' would be a good parent language for French, and I agree; in the experiment it seems like you flipped them so that French is the parent language, and French' is the child language. I don't know whether this makes any difference in your model at all, but please be consistent.
In Table 7 you report ablation tests, measuring model perplexity and BLEU on the dev set. I'm left with the question of how these two quantities correlate with BLEU on an unseen test set. This question comes up again in the dictionary initialization experiments (Figure 4), where perplexity is again reported, but there is no way to tell how well it correlates with translation performance.
Minor comments
s3p2: "employing w a separate" -> "employing a separate"
Fig1: the colors are really hard to see on a regular office laser printer print-out.
Fig2: I'm guessing that eg "Source Word" and "Source Word" are different source words, and that they may be close to each other, but please use proper indexing to clarify exactly how.
Xing Shi,
Kevin Knight and
Deniz Yuret.
2016.
Why Neural Translations are the Right Length. In
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp
2278--2282,
Austin, Texas,
November.
Association for Computational Linguistics. [
ai.ku]
url url annote google scholar
Title: Why Neural Translations are the Right Length
Authors: Xing Shi, Kevin Knight and Deniz Yuret
Instructions
The author response period has begun. The reviews for your submission are displayed on this page. If you want to respond to the points raised in the reviews, you may do so in the box provided below.
The response should be entered by 17 July, 2016 (11:59pm Pacific Daylight Savings Time, UTC -7h).
Response can be edited multiple times during all the author response period.
Please note: you are not obligated to respond to the reviews.
Review #1
Appropriateness: 5
Clarity: 5
Originality: 4
Soundness / Correctness: 5
Meaningful Comparison: 5
Substance: 4
Impact of Ideas / Results: 3
Impact of Accompanying Software: 1
Impact of Accompanying Dataset / Resource: 1
Recommendation: 5
Reviewer Confidence: 4
Comments
This paper convincingly explains why NMT translations are the right length; they show the mechanism (that there are components in the vectors that keep track of length during encoding and decoding) explicitly with a very clear toy example, as well as the tendency in a real-world task. The paper is admirably clearly written and very accessible.
Please keep in mind that some people still print papers out, and stick to a gray scale legible coloring scheme (blue and red are indistinguishable after the gray scale dimensionality reduction has been applied).
s1p1: "covert that vector in a target sentence." -> "convert that vector into a target sentence."
Review #2
Appropriateness: 5
Clarity: 4
Originality: 4
Soundness / Correctness: 4
Meaningful Comparison: 4
Substance: 4
Impact of Ideas / Results: 3
Impact of Accompanying Software: 1
Impact of Accompanying Dataset / Resource: 1
Recommendation: 4
Reviewer Confidence: 3
Comments
This paper investigates the question of how neural MT models manage to produce output in the right length. Starting with a smal toy problem and proceeding to an actual neural MT system, the paper shows that (groups of) specific cells are "dedicated" to encoding sentence length.
The paper is refreshingly different from many other NMT papers I've seen lately, in that it attempts to understand what's going on within the neural model, thus addressing a point of criticism that is occasionally brought forward against neural approaches to MT: that they are black boxes and no-one seems to care about what's going on inside.
The paper is well written and generally easy to understand. What I'm missing most is a good motivation for addressing this question. What do we gain from knowing that neural models explicitly encode length? Also, is this behavior consistent across neural models? What happens if we start model training with different random initializations?
A few minor quibbles:
- The images in Figure 2 are two small. You've got plenty of room left (6 pages max.!) to provide bigger images.
- Figures 4 and 5 should be tables, not figures.
Review #3
Appropriateness: 5
Clarity: 5
Originality: 3
Soundness / Correctness: 5
Meaningful Comparison: 4
Substance: 4
Impact of Ideas / Results: 4
Impact of Accompanying Software: 1
Impact of Accompanying Dataset / Resource: 1
Recommendation: 4
Reviewer Confidence: 4
Comments
One of the issues with NMT is the apparent opacity of the model. It is hard to know what is going in inside the block box. The authors start to peel back the curtain here by investigating the question of how NMT models output target sentences of the right length. By looking at both a toy auto-encoder with 4 units and a regular NMT model, they provide interesting insight and show that the model has one of a small handful of units devoted specifically to the token counting task.
The paper is well written & easy to follow and the insights will be of interest to the EMNLP audience.