Giovanni Cavallanti,
Nicolò Cesa-Bianchi and
Claudio Gentile.
2007.
Tracking the best hyperplane with a simple budget perceptron.
Machine Learning, vol
69, no
2-3, pp
143--167.
Springer.
(RBP). [
perceptron]
pdf annote google scholar
Randomized budget perceptron randomly discards sv when budget exceeded.
Claims similar performance to forgetron.
Why do we need norm of weight vector in the bounds?
Because we are using hinge loss as a surrogate for num mistakes,
and even though scaling w does not change num mistakes
it changes hinge loss?
SPA has weight decay before update.
Earlier SV decayed more than later ones.
Decay polynomial vs exponential in earlier work.
The mistake bounds proven are relative to the hinge loss of the best possible weight vector of a given norm:
We measure the performance of our linear-threshold algorithms by the total num- ber of mistakes they make on an arbitrary sequence of examples. In the standard performance model, the goal is to bound this total number of mistakes in terms of the performance of the best fixed linear classifier u ∈ Rd in hindsight (note that we identify an arbitrary linear-threshold classifier with its coefficient vector u). Since finding u ∈ Rd that minimizes the number of mistakes on a known sequence is a computationally hard problem, the performance of the best predictor in hindsight is often measured using the cumulative hinge loss (see, e.g., Freund & Schapire, 1999; Gentile & Warmuth, 1999). The hinge loss of a linear classifier u on example (x, y) is defined by d(u; (x, y)) = max{0, 1 − yu⊤x}. Note that d is a convex func- tion of the margin yu⊤x, and is also an upper bound on the indicator function of SGN(u⊤x) ̸= y.
I dont understand the bounds: as the budget gets bigger, shouldn't the expected number of mistakes get smaller?
Fig 3 shows CKS a lot worse and others similar on noisy dataset.
...they achieve the highest advantage when the budget size is close to the number of support vectors stored, on average, by their non-budget counterparts on each chunk of the dataset...
Rohit J Kate and
Raymond J Mooney.
2007.
Learning language semantics from ambiguous supervision. In
AAAI, vol
7, pp
895--900. cit
44. [
semparse, d=geo]
pdf abstract google scholar
This paper presents a method for learning a semantic parser from ambiguous supervision. Training data consists of nat- ural language sentences annotated with multiple potential meaning representations, only one of which is correct. Such ambiguous supervision models the type of supervision that can be more naturally available to language-learning systems. Given such weak supervision, our approach produces a se- mantic parser that maps sentences into meaning represen- tations. An existing semantic parsing learning system that can only learn from unambiguous supervision is augmented to handle ambiguous supervision. Experimental results show that the resulting system is able to cope up with ambiguities and learn accurate semantic parsers.
Luke S Zettlemoyer and
Michael Collins.
2007.
Online learning of relaxed CCG grammars for parsing to logical form. In
In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-2007.
Citeseer. cit
121. [
semparse, d=atis, spf, d=geo, afz]
pdf abstract annote google scholar
We consider the problem of learning to parse sentences to lambda-calculus representations of their underlying semantics and present an algorithm that learns a weighted combinatory categorial grammar (CCG). A key idea is to introduce non-standard CCG combinators that relax certain parts of the grammar—for example allowing flexible word order, or insertion of lexical items— with learned costs. We also present a new, online algorithm for inducing a weighted CCG. Results for the approach on ATIS data show 86 % F-measure in recovering fully correct semantic analyses and 95.9% F-measure by a partial-match criterion, a more than 5 % improvement over the 90.3% partial-match figure reported by He and Young (2006).
(*)
Solving the same problem as ZC05 paper on atis and geo.
Geo, jobs, restaurant are artificially generated, atis is natural!
New CCG combinators and new online algorithm more flexible with realistic language.
atis exact 1-pass p=.9061 r=.8192 f=.8605
atis exact 2-pass p=.8575 r=.8460 f=.8516
atis partial 1-pass p=.9676 r=.8689 f=.9156
atis partial 2-pass p=.9511 r=.9671 f=.9590
(He and Young 2006 atis partial f=90.3%)
geo880 1-pass p=.9549 r=.8320 f=.8893
geo880 2-pass p=.9163 r=.8607 f=.8876
(ZC05 p=.9625 r=.7929 f=.8695)
Still uses GENLEX with two additional rules.
Still uses initial lexicon with nouns and wh-words!
CCG additions include:
1. function application with reverse word order.
2. function composition with reverse word order.
(do we even need the syntactic cats with slashes?)
3. additional type raising and crossed-comp rules that need more careful reading.
Two important differences from learning algorithm of ZC05:
1. online updates instead of batch.
2. perceptron updates instead of SGD on NLL.
Learning algorithm:
1. skip example if parsed correctly with current lex.
2. introduce all genlex and find maxscore parse with correct semantics.
3. add the new entries in maxparse to lex and try parsing again.
4. do a perceptron update.
Lynne Murphy.
2007.
Lexical Semantics.
Subject Centre for Languages, Linguistics and Area Studies Good Practice Guide. [
3D]
url abstract google scholar
The nature of lexical semantics has changed markedly in the twenty-to-thirty years since classic texts like Lyons (1977) and Cruse (1986) were published. Such texts were written at a time when Structuralist lexical semantics essentially carried on separately from major [Generative] theories of grammar. During and since the 1980s, however, theories of grammar have become much more lexically-driven, necessitating much deeper attention to issues of lexical meaning. Unfortunately, there is a tendency in lexical semantics courses and in semantics textbooks to present lexical semantics essentially as it was 30 years ago, with the focus limited to polysemy/homonymy and the ‘nym’ relations (synonym, antonym, etc.). This guide examines ways to construct a modern classroom approach to lexical semantics, with a broader definition of the field.