Giovanni Cavallanti,

Nicolò Cesa-Bianchi and

Claudio Gentile.

2007.

Tracking the best hyperplane with a simple budget perceptron.

*Machine Learning*, vol

69, no

2-3, pp

143--167.

Springer.

(RBP). [

perceptron]

pdf annote google scholar
Randomized budget perceptron randomly discards sv when budget exceeded.
Claims similar performance to forgetron.
Why do we need norm of weight vector in the bounds?
Because we are using hinge loss as a surrogate for num mistakes,
and even though scaling w does not change num mistakes
it changes hinge loss?
SPA has weight decay before update.
Earlier SV decayed more than later ones.
Decay polynomial vs exponential in earlier work.
The mistake bounds proven are relative to the hinge loss of the best possible weight vector of a given norm:
We measure the performance of our linear-threshold algorithms by the total num- ber of mistakes they make on an arbitrary sequence of examples. In the standard performance model, the goal is to bound this total number of mistakes in terms of the performance of the best fixed linear classifier u ∈ Rd in hindsight (note that we identify an arbitrary linear-threshold classifier with its coefficient vector u). Since finding u ∈ Rd that minimizes the number of mistakes on a known sequence is a computationally hard problem, the performance of the best predictor in hindsight is often measured using the cumulative hinge loss (see, e.g., Freund & Schapire, 1999; Gentile & Warmuth, 1999). The hinge loss of a linear classifier u on example (x, y) is defined by d(u; (x, y)) = max{0, 1 − yu⊤x}. Note that d is a convex func- tion of the margin yu⊤x, and is also an upper bound on the indicator function of SGN(u⊤x) ̸= y.
I dont understand the bounds: as the budget gets bigger, shouldn't the expected number of mistakes get smaller?
Fig 3 shows CKS a lot worse and others similar on noisy dataset.
...they achieve the highest advantage when the budget size is close to the number of support vectors stored, on average, by their non-budget counterparts on each chunk of the dataset...