There are twenty-two senses of `over' and seven senses of `above' in RIT. All but one of the seven `above'-senses are shared by `over'. `Over' and `above' are two words commonly thought of as synonyms, regardless of context. When describing the position of a light fixture attached to a ceiling, in relation to a table below it, the two words are interchangeable and almost impossible to discriminate.
Consider, on the other hand, the phrase ``the house is over (beyond) the hill.'' Here, `above' cannot be substituted. Similarly, the word `over' cannot be substituted for the word `above' in the sentence ``It is above me.'' (as in, ``It is beyond my comprehension.'').
There is obviously a strong relationship between `above' and `over' but, as demonstrated in the examples, they are not equivalent in all contexts. The question of which synonyms are equivalent in all contexts, then arises. These words will be called word equivalents, and are the focus of this discussion.
Scrutiny suggests that word equivalents are of several types. Below are examples from the data which have been grouped under different headings in order to make explicit the author's interpretation of these types.
A first pass is to define a relevancy metric R(w1,w2) between words, defined as the number of senses shared by any two words or sets of words, w1 and w2, in relation to all of either word's senses. (s(w) denotes the set of senses of word w.)
|s(w1) intersected with s(w2)|
R(w1,w2) = ------------------------------
|s(w1)|
This has the advantage that the values range between zero and one, where
R(w1,w2)=0 indicates two semantically disjoint words and R(w1,w2)=R(w2,w1)=1
indicates two equivalent words--with degrees in between.
This metric has the drawback that the relevance of w1 to w2 is often different from the relevance of w2 to w1 because their polysemy is often different. For example, the relevance of over to above is R(above,over)=6/7 = 0.86 while the relevance of above to over is R(over,above)=6/22= 0.27
An alternative metric is to define the denominator as the total of the senses of both words:
|s(w1) intersected with s(w2)|
R(w1,w2) = ------------------------------
|s(w1) unioned with s(w2)|
For both metrics, synonyms that share many senses and have few independent
senses have a higher value, while synonyms that share few senses in relation to
their independent senses score low. The first metric can identify such
situations as when the senses of w1 are a subset of the senses of w2 because
R(w1,w2) is equal to 1 in that case.
The results suggest that word equivalents are a very small group. But note that many possible sets of equivalents will have been lost from the data because of homography. For example, any equivalents to lead (to guide) will have been corrupted by synonyms of lead (the soft metal)--`lead' and `guide' can never be equivalent because guide will never occur as a synonym to the metal.
Further study will focus on identifying patterns amongst synonym sets such as were found amongst word equivalents--it is possible that all synonyms fall under these suggested or similar 'type' headings. Etymologists will recognize that some of the culprits in the sample were caused by words of common ancestry arriving in the English language at different times or from different sources e.g. Woden and Odin; Caesar and czar (and Kaiser, if you will).
The metrics suggested are inadequate to account for the richness of relationships between synonyms. A more topological or graphical approach might be preferred. Other approachs are suggested by Kozima and Ito (1996) and Harper (1965).
Finally, the dual of synonymy, polysemy should be explored. Equivalent senses--senses described by the same words--may shed some light on this area of semantics.
Harper, Kenneth E., (1965). Measurement of Similarity between Nouns, in Proceedings of the International Conference on Computational Linguistics.
RIT3: A relational database version of Roget's International Thesaurus. Edited by W. A. Sedelow, S. Y. Sedelow, L. J. Old, University of Arkansas at Little Rock.