Dear Geoff, I haven't written for a while but I am still working on decision lists. I wanted to clarify some questions I had regarding the prepend algorithm for decision lists. I have a new implementation based on OPUS and a couple of students trying to replicate your results from the prepend papers. I am interested in playing with the preference function, if you remember, so I would like to figure out exactly how your old preference function works. Let us consider a rule that matches a total of r instances in the training set. r1 of these instances belong to the class that match the conclusion of the rule and r0 of the instances belong to other classes. Thus r = r1 + r0. Let us also say that there is an existing decision list we have been building using prepend. This decision list classifies d1 of the r instances correctly and d0 of them incorrectly. Thus r = d1 + d0 as well. Finally let us consider a decision list we have been building using append. In this case the r instances can be split into three groups: d0 of them are handled incorrectly by the current list, d1 of them are handled correctly, and d2 of them are not yet covered by the existing rules. The difference with prepend is due to the fact that append does not have a default rule until the very end but prepend does. Thus in the case of append r = d0 + d1 + d2. Below is an excerpt from your 93-94 paper, that I will try to interpret according to the above defined variables: > A positive case with respect to a rule is a case > belonging to the class identified by the conclusion > of the rule. I assume here that we only consider the instances which match the rule's antecedent. In that case according to this definition, number of positive cases is r1. > A negative case with respect to a rule is any case > that does not belong to the class identified by the > conclusion of the rule. Similarly, number of negative cases is r0. > To guarantee termination of the algorithm it is > necessary to terminate the repeat loop when the best > rule covers less positive than negative > cases. Otherwise the addition of a rule can increase > the number of misclassified cases, necessitating the > development of further rules to allow for those > cases. Such a process can continue in an infinite > loop. I assumed so far in this definition that positive and negative cases were only defined with respect to the rule itself and not considering the existing decision list. Therefore this statement claims that if r1 < r0 we should stop. If that is the case I do not agree with the above statement. A rule that covers less positive than negative cases can decrease the number of misclassified cases (e.g. in the case of all r instances being handled incorrectly by the current decision list), and a rule that covers more positive than negative cases can increase the number of misclassified cases (e.g. in the case of all r instances being handled correctly by the current decision list). It seems to be if we want to use the decreasing number of misclassified cases as an assurance for termination, we have to include d1 and d0 in this calculation. In particular we should simply ask for r1 - d1 > 0. > The same preference functions may be applied to > selecting successive rules as for the append > algorithm, with the difference that the positive > cover is calculated only from cases not correctly > covered by the existing decision list and the > negative cover is calculated only from cases > correctly covered by the existing decision list. In the case of append, the rule we add will only apply to the d2 instances that have not yet been covered by the existing decision list. Therefore I do not quite understand your definition that mentions "correctly covered" and "not correctly covered". Those seem to imply that you want to use d0 and d1, which the new rule has no way of altering. Shouldn't the correct expression for both positive and negative cover be "calculated from cases not covered by the existing decision list"? Let r12 and r02 be the number of instances not yet covered by the existing decision list that match and do not match the current rule's classification respectively. Thus d2 = r12 + r02. It is r12 and r02 that we want to use in the preference function. > When investigating the value of a rule, with respect > to negative cover, one should not consider cases > that are already misclassified. Further > misclassification of cases that are already > misclassified will not affect the performance of the > decision list. Similarly, one should not consider > the correct classification of cases that are already > correctly classified as this also will not affect > the performance of the system. In this paragraph, I assume we are talking about prepend again. My understanding here is that negative cover be the size of the intersection of r0 and d1, and positive cover should be the size of the intersection of r1 and d0. In particular instances that belong to (r1 and d1) or (r0 and d0) do not effect the performance. If I reinterpret the first paragraph according to this definition of positive and negative cover, we should stop if r1d0 < r0d1. Since r0+r1 = d0+d1, this is equivalent to the rule I suggested i.e. (r1 < d1). Finally if I understand correctly the Laplace evaluation function you are using is: r1d0 + 1 value = ----------------- r1d0 + r0d1 + 2 Where r1d0 represents the number of instances correctly classified by the rule but incorrectly classified by the existing decision list etc. Is this correct? thanks, deniz P.S. Has anybody been working on a probabilistic model for the error rate of decision lists, trees etc.? Or is it a well known fact that it is an impossibly difficult problem?