A Broken Clock is Right Twice Daily
Class Imbalances
Many real-world groups of data involve imbalanced classes, for which there are more instances of one class than another. They often occur for example in medical studies, or in comparisons of statistics for racial minorities against those of the majority.
In such situations, our intuitions about how [things] should behave, our mathematical methods, and even our ‘scoring’ metrics themselves, may not be suitable for getting at the truth.
Imagine a situation with 99 members of class A and only one member of class B. [Provide a real-world case]. A model that predicts class A all the time would end up having an accuracy of 99%, which looks great. This would be true even though it misidentifies the instance of class B every time. So reporting on this also depends on your point of view: The model works 99% of the time, and yet for members of class B it is 100% wrong all the time.
When we are faced with imbalanced classes, we need to be more careful than when classes have equal numbers in each class. There are various ways of doing this.
One way is to use Bayes’ Theorem [which we discuss in some earlier section], where the overall probabilities $P(A)$ and $P(B)$ will naturally allow us to weight any conditional probabilities with their likelihood for each class.
see MLM links:
- https://machinelearningmastery.com/classification-accuracy-is-not-enough-more-performance-measures-you-can-use/
- https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
- https://machinelearningmastery.com/tour-of-evaluation-metrics-for-imbalanced-classification/
- https://machinelearningmastery.com/classification-accuracy-is-not-enough-more-performance-measures-you-can-use/