In a separate blog post that I’lll finish someday, I’m writing about contrastive losses and how they tend to produce “semantically meaningful” embeddings. By “embeddings”, we mean “vectors” (equivalently, points) in some high-dimensional space. These embeddings are also referred as “representations”, e.g. in the sense of the International Conference on Learning Representations and…that audio thing last year.
Representations is a loaded word because it carries the weight of the history of philosophy, most notably Immanuel Kant’s writing on representations, in the sense of “mental representations” of, say, objects in the physical world. One of Kant’s points was that these representations are not the things in themselves, but that the former were all we’d ever know, though…presumably?… we could continue to refine them with experience? [TODO: ask somebody about Kant]
In ML, the idea that the representations are not the things themselves and that they are progressively refined goes without saying.
But I would go further to say that in ML these are not even representations of the things themselves, rather they are (progressively improving) representations sufficient for the task at hand. This may seem obvious, but in the case of classification it becomes [stark]: the representation learned is that which is sufficient for both grouping like items and distinguishing them from dissimilar items in the training set. So, in the case of distinguishing images of cats from those of dogs and horses, one might naively think that the classifier is learning the “essence of (images of) cat-ness”, but is not. Rather it is learning the [how to say it?] chain-of-features-of-features that results in mapping cat images to similar embedding-locations and away from embeddings for other types of images. You say, “Sure, but this is a tautology: you’ve just described what a contrastive loss does again.” Ok, but I’m trying to emphasize that, for example, the dog images have an effect on the learned representations of cat images (because the task for which the classifier is trained). For example, if we were trying to distinguish between breeds of cats, the features of “whiskers” and “slit-shaped irises”, and “triangle-shaped ears” would be nearly useless, whereas for distinguishing cats vs. dogs these features could be useful.
Do humans do anything similar to this? Did Kant talk about representations in “contrastive” ways? In other words, does my mental representation of — geez, pick anything sure, let’s got with cats — depend on my representation of dogs? I would say yes. We often learn to “bound” our concepts based on not just what they are but by why they are not. Paul writing to the Corinthians on love, describes both what love is and what it is not. There’s a Veritasium video where the host quizzes people about learning some numerical sequence, and gives extra praise for the interviewees who spent some time testing what values did not fit the sequence. Without asking about these counterexamples, they’d have been wrong. (If we asked Kant, can we know what something is, without knowing what its opposite is? What if we asked Hegel? Can we know hate without love, or vice versa?)
This reminds me of a nearest-neighbors classifier, where the domains for a given class extend outward infinitely in one or mor directions, until new points are added that place limits on those domains.
And make no mistake, classifying on the basis of an contrastive-loss-driven embedding isa nearest-neighbors method, even if the points are generated via some ‘convoluted’ process ;-), and no matter whether a Euclidean distance or a cosine similarity is used for the metric.