Star ratings systems have, in one form or another, made their way into most aspects of the internet and associated technology. They’re a quick and appealing way to visually gauge how others have rated something, and to likewise follow suit.
They’re largely borne of the five-point Likert scale:
- Strongly Agree
- Agree
- Neutral
- Disagree
- Strongly Disagree
In this sense, the five-point Likert scale is dealing with discrete date. With each point representing an isolated and unique point on the rating scale.
Being discrete data, in the strictest sense, there should not be an averaging of the data points.
For example, if one person selected “Strongly Agree” and another person selected “Agree” then there would be a count of one data point in the “Strongly Agree” category and another count of one data point in the “Agree” category.
An average, such as the mean, should not be used in this instance.
It would be incorrect to add the “Strongly Agree” and “Agree” together to get something in between akin to “Firmly Agree” even if the loose inference may be there.
It is in light of this, that the star ratings systems commonly used on the internet tend to encounter issues. The mapping tends to follow something like:
- 5 Stars = Strongly Agree
- 4 Stars = Agree
- 3 Stars = Neutral
- 2 Stars = Disagree
- 1 Star = Strongly Disagree
It is also not uncommon to have the number of stars representing a numerical scale, making it in many ways like continuous data of sorts.
While there is nothing particularly wrong with turning continuous data into discrete data via discretising it into bins of varying sizes in a histogram (graph), it is not without its challenges. Particularly related to the size of the bins, and how much data then goes into them, along with the binning process whereby the size and edges of the bins can be arbitrary which can be used to shift to the focus of the data.
The corollary of turning discrete data into continuous data is even more problematic; it is often simply incorrect to do so.
However, at least with star ratings systems there is an order to the data, in that “5 Stars” is better than “4 Stars” is better than “3 Stars” and so on.
Where things become somewhat muddled by this, pertains to the averaging process typically employed by five-star ratings systems.
The common practice is to take the mean of the star ratings.
For example, assuming there was one rating put into each of the possible star ratings, then the tendency is to do as follows:
5 + 4 + 3 + 2 + 1 = 15
Dividing by a count of five ratings entered gives:
15/5 = Average Rating = 3 Stars
In the example, the number was a neat “3 Stars”, but it is not uncommon to see average ratings such as “4.73 Stars”. This is where the mess with assuming continuous data begins to manifest, because the star ratings systems are often treated as scores, mapping somewhat as follows:
- 5 Stars = 100%
- 4 Stars = 80%
- 3 Stars = 60%
- 2 Stars = 40%
- 1 Star = 20%
In which case, the average (via the mean) assuming there was one rating put into each of the possible star ratings, then:
1 + 0.8 + 0.6 + 0.4 + 0.2 = 3 and 3/5 = 0.6 = 60%
Here, the “average” is not necessarily ‘neutral’ at the 50 percent mark (typically the minimum point at which a test is passed), but has instead been inflated to the better score of 60 percent.
This ratings inflation can also occur in other ways, such as where those to be rated, provide guidance on how to rate them.
Such as suggesting if “satisfied, please rate us 5 stars” and if “unhappy, please rate us 3 stars” which if followed gives a bare minimum of an average rating in excess of the neutral 50 percent mark.
Note in such examples, there is no “0 Stars” choice provided. Which if given, again assuming there was one rating put into each of the possible star ratings, then:
1 + 0.8 + 0.6 + 0.4 + 0.2 + 0 = 3 and 3/6 = 0.5 = 50%
This now gives a less biased, and more realistically neutral average point estimate of 50 percent.
It may be, however, that people instead treat the typical five possible choices as follows:
- 5 Stars = 100%
- 4 Stars = 75%
- 3 Stars = 50%
- 2 Stars = 25%
- 1 Star = 0%
In which case, the average (via the mean) assuming there was one rating put into each of the possible star ratings, then:
1 + 0.75 + 0.5 + 0.25 + 0 = 2.5 and 2.5/5 = 0.5 = 50%
Then the “average” is therefore neutral at the 50 percent mark.
Yet anecdotally, this seems inconsistent with how people use and treat five-star ratings systems.
At present, typically the response is to rate something “1 Star” and then turn to the comments to further express dissatisfaction, often the effect of:
“Awful. If I could give it zero stars I would!”
Albeit, often phrased with more colourful language and a lot more exclamation marks.
Another challenge with the five-star ratings systems is whether people view the stars as discrete points or more as each point representing a range of continuous data. Subject to synonyms, it’s intuitively understood that “1 Star” is “terrible” and “5 Stars” is “excellent” and that may be enough to be helpful, but it does again become somewhat of a mess with the averaging process typically employed to find the overall average rating for something.
Should the mapping then, be considered more as follows:
- 5 Stars = 80-100%
- 4 Stars = 60-79%
- 3 Stars = 40-59%
- 2 Stars = 20-39%
- 1 Star = 0-19%
It’s not wholly exact due to rounding and precision with respect to the number of decimal places used, and it also highlights one of the difficulties of discretising data in forming bins for histograms, especially with what happens to the data at the edges of the histogram bins.
Does someone believing something is not “perfect” opt to rate it “4 Stars” instead of “5 Stars”, or, do they choose the “5 Stars” option in the belief that the score of what is being rated would be greater than or equal to 80 percent?
It also introduces additional noise, or variability, into things. Which is not to suggest that star ratings systems are pinpoint accurate. There is a level of imprecision inherent in any measurement.
If there’s approximately a range of 20 percent with a star rating category, then with the exceptions of at the boundaries of “1 Star” and “5 Stars”, then the “average” rating may be as much as plus or minus a whole star either side.
While all of these issues cannot be perfectly solved, what is being suggested is transitioning to a rating system in uthinki of using “0-5 Stars” represented as follows:
- 5 Stars = 100%
- 4 Stars = 80%
- 3 Stars = 60%
- 2 Stars = 40%
- 1 Star = 20%
- 0 Stars = 0%
Which although not perfect, and still subject elements of imprecision, would help to address the discrete versus continuous data issues, including ‘neutral’ averaging and still allow for this being represented to two decimal places such as “4.73 Stars” within a visual scale of between “0 Stars” and “5 Stars”.
Whether the proposed “0-5 Stars” is a better rating system than “1-5 Stars” is open to debate, so comment away below, along with helping by answering the uthinki question: