Net Promoter Score Considered Not Harmful (and How UX Professionals Can Use It to Their Advantage)
Updated: May 14, 2019
Since Jared Spool published his essay, "Net Promoter Score Considered Harmful (and What UX Professionals Can Do About It)" in December 2017, I have seen it tweeted and retweeted a lot, usually with tons of associated comments trashing the Net Promoter Score (NPS).
But when I carefully read the essay and compared it with its cited research, I did not find the specific criticisms compelling. That surprised me. I have read quite a bit of what Spool has written, and sometimes his insights are striking and beautiful (e.g., Consistency in Design is the Wrong Approach). On this topic, however, it seems like he may have made a bold public statement that is consistent with the feelings of many UX designers, but does not withstand careful critical examination. It may also cause UX professionals to miss opportunities to demonstrate how their work connects to this popular corporate metric, widely believed by executives to be a leading indicator of loyalty-based growth (and, as Spool has recently written, UX professionals have The Need to Think and Talk Like an Executive).
What is the Net Promoter Score?
The best source for understanding the NPS and its scoring is to read the short 2003 Harvard Business Review (HBR) article by Fred Reichheld in which it was introduced (The One Number You Need to Grow). It really is not very long, and if you're interested in this topic, you owe it to yourself to read it. It takes about 15 minutes ... I'll wait ...
OK, now that you've read the short HBR paper, here's a quick review of what the NPS is. The basis of the NPS is a single-item measure of likelihood-to-recommend:
As described in the HBR paper, respondents are assigned to one of three categories: Promoter (9 or 10), Neutral (7 or 8), and Detractor (0-6). The NPS is the net difference computed by subtracting the percentage of Detractors from the percentage of Promoters. This means that the NPS can take values from -100 (terrible) to +100 (excellent).
Spool's Specific Criticisms of the NPS -- 12 Claims
If you're reading this blog, you've probably already read the essay that kicked off the discussion. If you haven't, as with the HBR article, you owe it to yourself to read it (Net Promoter Score Considered Harmful). By my count, the essay contains at least 12 problematic claims. To understand why these are problematic, we are sometimes going to have to dig below the surface claims into what the cited literature actually says. Here are the claims (italicized) followed by discussion (with key phrases in bold).
1. "Even though NPS has been solidly debunked in many smart research papers, it’s still solidly embedded into many businesses."
Here the essay links to the Wikipedia article's section on criticisms of the NPS. The key criticisms listed in the Wikipedia article are (a) NPS does not add anything to other loyalty-related questions, (b) NPS uses a scale that may not increase predictive validity, (c) single item questions are less accurate than a composite index of questions, and (d) NPS fails to predict loyalty behaviors.
Note that most of these criticisms do not support a claim of NPS having been "solidly debunked." Even if NPS does not add anything to other loyalty-related questions, the cited articles do not indicate that it is any worse than, for example, a satisfaction question. The Wikipedia article has been selective in the papers chosen to criticize NPS on psychometric grounds (number of response options and single-item vs. composite index of questions). For a recent paper that included a literature review about the number of response options, see User Experience Rating Scales with 7, 11, or 101 Points: Does It Matter? Classical test theory does predict that measurements made up of ratings to multiple items will usually be more reliable than single-item metrics, but there is really no way to assess which is more "accurate."
If the NPS consistently failed to predict loyalty behaviors, that would be a serious problem. Digging deeper into this claim by reading the two cited papers, it turns out that one (Keiningham et al., 2007) reports carefully conducted statistical modeling supporting the use of multiple loyalty metrics (LTR plus retention and repurchase intention) rather than just LTR. Notably, LTR was included in their predictive models, but only as a standard multipoint measure, not converted to NPS. The other cited paper (Pollack & Alexandrov, 2013) concluded, " The paper recommends including the NPI in a portfolio of voice of customer metrics but not as a standalone diagnostic tool. Further, given the present state of evidence, it cannot be recommended to use the NPI as a predictor of growth nor financial performance." So, far from debunking the use of LTR/NPS as a terrible and unreliable measure, these criticisms actually provide some support for its use, either as part of a composite measure or as part of a portfolio of voice of customer metrics.
Pollack and Alexandrov (2013) qualified their criticism with "given the present state of evidence." Over the past year or so, Jeff Sauro at MeasuringU.com has reported a number of interesting studies that have replicated and extended the original NPS research findings. Note that Jared Spool is not a fan of this line of research and has trashed it on Twitter, but I have yet to see a compelling criticism from him of either method or analysis in this research (Twitter does not lend itself to nuanced argument). Jeff has written a number of essays about NPS, all available at MeasuringU.com (e.g., Can UX Metrics Predict Software Revenue Growth?), but let's focus on Assessing the Predictive Ability of the Net Promoter Score in 14 Industries. In that study, the key takeaway was the finding that although NPS was not a perfect predictor, there was a "modest correlation between Net Promoter Scores in 11 of 14 industries for the immediate two-year and four-year future periods. ... This analysis also showed similar (albeit smaller) correlations for the four-year future period (r = .31) and when using relative ranks for the same periods (r = .44 for two-year and r = .29 for four-year growth)."
2. "Any normal statistician would just report on the mean of all the scores they collected from respondents. For reasons never fully explained, NPS doesn’t like the mean average of the numbers they receive. Instead, they segment the scores into three components."
As documented in The One Number You Need to Grow, this was a data-driven decision, not arbitrary. Quoting Reichheld (2003) -- "We then obtained a purchase history for each person surveyed and asked those people to name specific instances in which they had referred someone else to the company in question. ... With information from more than 4,000 customers, we were able to build 14 case studies—that is, cases in which we had sufficient sample sizes to measure the link between survey responses of individual customers of a company and those individuals’ actual referral and purchase behavior. ... The data allowed us to determine which survey questions had the strongest statistical correlation with repeat purchases or referrals. ... One question was best for most industries. ‘How likely is it that you would recommend [company X] to a friend or colleague?’ ranked first or second in 11 of the 14 cases studies. And in two of the three other cases, ‘would recommend’ ranked so close behind the top two predictors that the surveys would be nearly as accurate by relying on results of this single question. ... When we examined customer referral and repurchase behaviors along this [0-10-point] scale, we found three logical clusters. ‘Promoters,’ the customers with the highest rates of repurchase and referral, gave ratings of nine or ten to the question. The ‘passively satisfied’ logged a seven or an eight, and ‘detractors’ scored from zero to six."
Jeff Sauro has also independently investigated the efficacy of this clustering. In Do Detractors Really Say Bad Things about a Company, following an examination of 452 open-ended comments about customers' most recent experience with nine prominent brands and products, he found Detractors accounted for 90% of negative comments and that a LTR rating of 6 was a good threshold for identifying negative comments.
3. "After all this hard work, we get all sixes: 6, 6, 6, 6, 6, 6, 6, 6, 6, and 6. The average of these ten numbers is 6. But NPS is still -100. For some reason, NPS thinks that a 6 should be equal to a 0. Nobody else thinks this."
The rationale for this is clear from Reichheld (2003) -- if you don't move the rated attitude into the upper two response options, then it might as well be -1 or 0 due to the nonlinear relationship between attitude and behavior when it comes to predicting loyalty behavior. On the other hand, it is reasonable to track LTR means as well as NPS' percentages of extreme responses to reveal UX improvement that the NPS can obscure, but there's no reason why an enterprise can't walk (track NPS) and chew gum (track mean LTR) at the same time.
4. "As you can see, the NPS calculation makes little sense. There is no business or mathematical reason for these awkward, abrupt changes in the score."
As explained above, there are both business and mathematical reasons -- abrupt is what you get with nonlinearity (also, see Are Top Box Scores a Better Predictor of Behavior).
5. "NPS uses an 11-point scale. That’s a large scale with a lot of numbers where the distinction isn’t clear. You and I could have the exact same experience, yet I’d give it a 7 and you’d give it a 6. Is there a meaningful difference? We’re somehow supposed to understand the difference between a 6 and a 7. But many respondents don’t. It’s their whim as to what they choose. In NPS, a dataset full only of sixes scores -100 and a dataset full only of sevens scores 0. To NPS, it’s a very big distinction, but to a respondent, it’s just noise. Respondents can’t tell you why they’d pick one over the other."
It's not the individual choice that matters, but rather the aggregate in a large sample of ratings and, again, it’s the nonlinear relationship between attitude measured on the 0-10 scale with actual referral and purchase behavior that led to the NPS.
6. "When implementing NPS, we ask each respondent How likely are you to recommend [COMPANY] to a friend or colleague? On the surface, this question seems to be about customer loyalty. In the original HBR article, the author claimed it correlated strongly with repeat purchases and referrals. Later studies show it doesn’t."
"Later studies" links to a Keiningham et al. (2007) study, which was very critical of the NPS, but was critical not with regard to analyses of repeat purchases and referrals, but rather was critical with regard to the claim that NPS was better than the ACSI satisfaction score -- "We find no support for the claim that Net Promoter is the 'single most reliable indicator of a company’s ability to grow.' Although we do not have access to the raw data from which these claims were made, we were able to compare some of the exemplar cases of Net Promoter with the ACSI, which Reichheld (2004) reports does not correlate with growth. Instead, we found that when making ‘apples-to-apples’ comparisons, Net Promoter does not perform better than the ACSI for the data under investigation." -- But it also did not perform more poorly. This, along with Sauro's recent studies, suggest that the claim "Later studies show it doesn't" is not accurate.
7. "We see from Dan’s purchase data, his shopper spent the most money ($110) as they responded with an 8. They rated their lowest amount spent ($57.60) with a 9. The order value when they scored a 5 was only $3.00 less than only order they rated a 10. From this data, we see there’s no correlation between shopping behavior and NPS response. Nor is there any evidence of loyalty."
Nine data points is an insufficient sample size to make any claims about NPS or correlation. There's no way to use this data to assess loyalty. Absence of evidence is not evidence of absence.
8. "We’ve learned that NPS doesn’t tell us anything about the customer’s experience or their loyalty. In fact, we can’t trust NPS to tell us anything useful."
This statement was preceded by a couple of anecdotes. Anecdotes are not evidence.
9. "NPS is Easy to Game"
If you’re uninterested in actual measurement and improvement, anything is easy to game.
10. "To these NPS proponents, I tell them that it’s great they are getting this valuable data. Why should they bother with the score question at all? Just ask the qualitative question. Their response is usually some mumbling and huff-puffery about segmentation or indicators or some other mumbo-jumbo that makes no sense."
If you have sufficiently large sample sizes to minimize estimation error and track scores over time, whether NPS, satisfaction, effort, or some other metric, there is value in the exercise. There probably isn't anything especially magical about NPS, but if you already have executive buy-in, why throw it away as long as you're running a valid research program and not trying to game anything?
11. "There are tons of numbers. An infinite number of them, in fact. Yet, there’s no one number that represents a company’s customer experience. Not even NPS. Yet, that won’t stop us from trying. We could use a business number, like the number of subscriptions or the amount of churn. We could use sales, net revenues, or profits. These numbers don’t speak directly to the design of the products or services. They don’t tell us whether the customers are satisfied, or better yet, delighted. That’s what NPS is trying to do, even though it doesn’t come close to succeeding."
This is a strong claim, offered with no compelling evidence that these other questions would work any better or worse than NPS, satisfaction, or effort ratings. Also, see What Is the CE11 (And Is It Better than the NPS).
12. "People who believe in NPS believe in something that doesn’t actually do what they want. NPS scores are the equivalent of a daily horoscope. There’s no science here, just faith."
Another big claim offered with strong polemics, but no compelling evidence. The NPS might not be a perfect measure of customer loyalty or a perfect leading indicator of growth, but as discussed in the points above, its use can be scientifically justified with much more than faith.
How UX Professionals Can Use the NPS to Their Advantage
Is the NPS perfect? Clearly no, and it might not be the only loyalty metric that enterprises should track. Referring back to Reichheld (2003), "Although the 'would recommend' question generally proved to be the most effective in determining loyalty and predicting growth, that wasn’t the case in every single industry. ... Not surprisingly, 'would recommend' also didn’t predict relative growth in industries dominated by monopolies and near monopolies, where consumers have little choice."
But when recommendation behavior is in play, the NPS is clearly not random or useless. For example, there is compelling evidence that there is a strong correlation (r > 0.60) between perceived usability and the LTR ratings that are used to compute the NPS (e.g., see Predicting Net Promoter Scores from System Usability Scale Scores).
So, if you are working with an enterprise where executives care about NPS (and therefore care about improvements in LTR), you could argue that improvements in perceived usability have a known relationship to improvements in likelihood-to-recommend, making the kind of connection between UX and corporate executive metrics advocated by Jared Spool in The Need to Think and Talk Like an Executive. As a rough estimate, every improvement of 10 SUS points (on its 0-100-point scale) corresponds to an improvement of 1 LTR point (on its 0-10-point scale).
If you're working with an enterprise that uses NPS when recommendation behavior is unlikely, rather than savagely attacking the practice, you'll probably be more successful helping them to understand the limitations of NPS in that setting and, if they are not already doing so, encouraging them to add other loyalty metrics to their voice of the customer program. You might also expose them to the Technology Acceptance Model (see 10 Things to Know about the Technology Acceptance Model), which was originally developed to help information systems researchers understand the key drivers of employee acceptance of business systems. Those key drivers turned out to be Perceived Usefulness and Perceived Ease of Use (and the Perceived Ease of Use component appears to be closely related to measures of perceived usability like the SUS and UMUX-LITE -- see Is the Report of the Death of the Construct of Usability an Exaggeration).
When carefully scrutinized, the claims made in Net Promoter Score Considered Harmful (and What UX Professionals Can Do About It do not stand up. Not only is the NPS not harmful, when UX professionals work with enterprises that use the NPS, they are far more likely to have productive discussions about the value of their UX work if they take advantage of the known relationship between measures of perceived usability and likelihood-to-recommend and are aware of the growing body of research that supports the use of NPS as a corporate metric, especially when users are likely to engage in recommendation behaviors.