Mehmet Ali Yatbaz,
Volkan Cirik,
Aylin Küntay and
Deniz Yuret.
2016.
Learning grammatical categories using paradigmatic representation: Substitute words for language acquisition. In
COLING,
December. [
ai.ku, scode]
pdf pdf pdf annote google scholar
============================================================================
COLING 2016 Reviews for Submission #383
============================================================================
Title: Learning grammatical categories using paradigmatic representations: Substitute words for language acquisition
Authors: Mehmet Ali Yatbaz, Volkan Cirik, Aylin Küntay and Deniz Yuret
============================================================================
REVIEWER #1
============================================================================
---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------
Relevance: 2
Originality: 2
Technical correctness / soundness: 2
Readability and clarity: 2
Meaningful comparison: 2
Substance: 2
Impact of ideas: 2
Impact of resources: 1
Overall recommendation: 2
Reviewer Confidence: 3
---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------
This paper suggests that performance of the prediction of a syntactic
category of a word in CHILDS corpora performs better with the model of
the pattern with a slot (i.e. described as "a * b" in the article)
than using a model separating the pattern heads ('aX') and the tails
('Xb') (i.e. described as "aX+Xb").
The performance of "a*b" being better than for "aX + Xb" in prediction
seemed obvious to me in the presented experiment, since as in Figure
1, the information used for prediction in the former is larger than
for the latter. In this sense, the comparison performed in this paper
is not a "fair comparison". Maybe I misread, but if so, then authors
must make sure to convince the readers that they are conducting a fair
comparison.
Therefore, I did not understand what claim is made in this article and
my judgement with respect to COLING presentation unfortunately has to
be reject.
Apart from this main point, there are controversial issues related to
this article.
1. The authors claim that "a*b" corresponds to "paradigmatic"
representation, whereas "aX+Xb" to "syntagmatic", but uses of these
two terms had better be reconsidered. In linguistics, paradigmatic
and syntagmatic has the origin in F.de Saussure's definition. Calling
"a*b"/"aX+Xb" as paradigmatic/syntagmatic seems too wide.
2. The authors refer to EMNLP-coNLL paper (the last reference), which
has similar keywords in the title. If this refered paper is the
authors' paper, then since the review process must be double blind,
the authors should disclose the author names. If not, it must be shown
what exactly is new of this paper compared with the reference.
3. The authors must explain more about their exact intent of this
work. Do the authors want to engineer the CHILDES text? Then, the
authors must explain the application needs and situate their work
among previous work in the domain of engineering. Do the authors want
to say from this paper, that children must be processing grammar
through "a*b"? Then, the authors must explain the relation of their
experiment how the experiment could relate to a cognitive claim.
4. Above all, I wonder whether this paper is appropriate for a
presentation in the conferences of ACL community. Majority of the
references are in the cognitive scientific domain. If behind this
paper lies a cognitive question, then, it had better that they seek
for a discussion in a conference dedicated in cognitive science
domain.
============================================================================
REVIEWER #2
============================================================================
---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------
Relevance: 5
Originality: 3
Technical correctness / soundness: 4
Readability and clarity: 5
Meaningful comparison: 4
Substance: 3
Impact of ideas: 3
Impact of resources: 1
Overall recommendation: 4
Reviewer Confidence: 4
---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------
## General comments
This paper presents an interesting study about the use of paradigmatic
presentations for word category learning in the setting of natural language
acquisition. The authors compare two models of word representations --
syntagmatic (where each word is represented based on its neighbors), and
paradigmatic (where the word is represented using substitute words). The
experiments presented here clearly show the benefit of paradigmatic
representations, and this is a very interesting contribution to the field.
However, I'm having a hard time understanding the dichotomy presented between
these two representations: both seem to rely on context-based substitutability
criteria, since even the paradigmatic presentation constructs the substitute
words via use of n-grams which in turn are created by looking at the
neighboring words of each target word. A related point is that there is a
slight misrepresentation of Harris's distributional hypothesis in the example
presented in the Introduction: in Harris's view, substitutability arrives
through the use of all (total) environments of the words/morphemes; it is
therefore "unfair" to compare the substitutes which have been created from a
very large collection of environments (n-grams) with the local context of these
two sentences.
## Specific issues/comments
The first sentence of your introduction would find an entire section of the
field in disagreement, namely researchers who subscribe to the constructivist
approach who believe that children learn rules about individual words and only
later form abstract syntactic categories.
After the example presented an introduction, you mentioned that "the high
probability substitute reflect both semantic and grammatical properties". As
many papers in the distributional semantic network area have shown, this is not
necessarily the case especially if you select a very small window from which
you create your context representations (something which applies to the n-grams
you use in this work), in which case they mostly reflect syntactic properties.
How do you justify the claim that the Redington et al. method "lacks
completeness"?
There are a few missing references from the part of speech induction literature
that would be interesting to include: Alex Clark's 2003 "Combining
distributional and morphological information for part of speech induction",
Christodoulopoulos et. al 2011 "A Bayesian Mixture Model for PoS Induction
Using Multiple Features" and Blunsom and Cohn's "A Hierarchical Pitman−Yor
Process HMM for Unsupervised Part of Speech Induction".
In the input corpora that you used, did you remove the punctuation marks? They
tend to provide a very strong distributional cue and are sometimes explicitly
removed to better simulate the linguistic input of children.
It is unclear what the contribution of Figure 1 is. Is it theoretically better
or worse to have more units as input to the model? Is there any evidence to
support these theoretical claims?
Given the size of the training corpus for the n-gram model, I would suggest
that the comparison to the syntagmatic models which are trained on
significantly fewer words seems problematic.
I would've liked to see a per-tag graph of the number of substitutes for each
target word, since I would think that for some cases of word classes
(interjections) this number would return a lot of noise.
It would be better if you describe the connection between the results that you
get in this work and those of St. Clair et al. 2010.
============================================================================
REVIEWER #3
============================================================================
---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------
Relevance: 4
Originality: 4
Technical correctness / soundness: 4
Readability and clarity: 4
Meaningful comparison: 3
Substance: 3
Impact of ideas: 3
Impact of resources: 1
Overall recommendation: 4
Reviewer Confidence: 3
---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------
This paper describes a comparison of two approaches to POS tagging, one
(syntagmatic) in which tags are assigned based on ngrams, and one
(paradigmatic) in which tags are assigned based on possible lexical
substitutions. The paper reports the latter (paradigmatic) gives higher
accuracy.
Overall it's an interesting result, but I have some reservations about how
generally the results may be interpreted. First, if this model is to be taken
as an acquisition model, as Section 1 seems to describe, I'm surprised the
model uses supervised learning. Second, given this model is a supervised POS
tagger, the accuracy results seem rather low compared to state-of-the-art
sequence model taggers. This is potentially a problem because the gains due to
paradigmatic modeling may not turn out to be significant in the presence of a
more robust syntagmatic baseline.
Minor comments:
The last term in equation 3 should be "P(w_{n-1}|w^{n-2}_0)" (the zero is
missing).
Figures 2 and 3 should use bar graphs, since the slopes of lines between
columns isn't interpretable.
"patter(n)s" p. 5.
You may want to cite work on syntagmatic and paradigmatic parser acquisition by
Simon Dennis at U. Adelade in Australia.