WSJ: wine-rating system is badly flawed

spin the bottle sm Saturday’s WSJ catches up with Robert Hodgson’s research on the randomness of gold medals in wine competitions. In case you missed our discussion here and many others on them there internets, you can check out the WSJ article for a recap. The story also applies the discussion to wine ratings and scores, underscoring their inherent subjectivity even though pallets of wine are bought and sold every day on these snapshots.

The author, Leonard Mladinow, wrote a book last year called “The Drunkard’s Walk: How Randomness Rules or Lives” (amazon; aff). In it, he has a brief section on inherent subjectivity and variation in wine descriptions and ratings. He points out the importance of aggregating several reviews and then expressing the standard deviation with the final score, as in “90 points, plus or minus 6.” Think that would fly in a sales email? Yeah, me either. But a site of user-generated reviews, such as cellartracker, could easily calculate a mean score and standard deviation from the reviews on any given wine in their database. It may not be ever-so-slightly more difficult to read than a single number but it would be a big score for accuracy in reflecting user experiences across diverse settings.

Anyway, check out the article. Here’s one quote for the candor file, from the publisher of a magazine that uses scores: “It is absurd for people to expect consistency in a taster’s ratings. We’re not robots.”

A Hint of Hype, A Taste of Illusion” WSJ
(Image: a reduced size crop of an image credited to Chris Wadden that ran with the story)

pixel

33 Responses to “WSJ: wine-rating system is badly flawed”


  1. You don’t have to a robot to be consistent. You just need criteria for awarding points, puffs, stars, whatever.

    Oh, and it helps to have good chops and this think called “knowledge”.


  2. Real sweet of the WSJ to publish this right before the holidays. I was afraid that I might have to go through Thanksgiving and Christmas without people throwing “proof” at me that the world of wine is a big scam and that “even the experts can’t tell expensive wine from a combination of grape juice and vodka!”

    I’ve got my own problems with how wine is rated, and I ignore ratings, medals, and hype when I’m selecting a wine. But as articles like this spread around via e-mail and discussion on non-wine blogs, it overshadows the real skill and knowledge that goes into wine appreciation, making the whole enterprise look at best foolish and at worst fraudulent.

    I’m not saying this interesting article shouldn’t have been published… Just let me enjoy the holidays in peace. :)


  3. Give Arthur a Gold Medal, 100 Points, three puffs (they are actually stars regardless of their accepted nickname), ten chopsticks out of ten and a silver dollar.

    It helps to have depth of knowledge to be a good reviewer. Whether or not one cares for any single reviewer, most are fairly consistent whether it is Parker or Jay Miller or Tanzer or any of the rest of us.

    Now, it also helps to have some personal knowledge and awareness so when a reviewer starts handing out 95 ratings to his or her friends or goes native (yes, all Merlots from Long Island are not 90 points) or anti-native (there is not a good Pinot Noir in Oregon or a good Chardonnay in California), you will be able to sense where knowledgeable opinion leaves off and nonsense begins.

    And whoever was quoted in the WSJ was either quoted out of context or has no faith in his or here own opinions and thus should not be foisting them on the world.


  4. Let me propagate a bit of WSET propaganda: Professional wine critics are likely to agree that a wine is dry or sweet, fruity or austere, aromatic or not, having mineral qualities or not. Beyond that, it’s all subjective. There is no crime here. If we were to sit Parker, Tanzer, Robinson, Broadbent, and whomever else down with a particular wine, they more than likely will come up with different aroma/flavor descriptors. This does not mean these people are fraudulent; it means they are human beings, with different histories of tasting different wines and getting different results. To expect anything else is simply unrealistic.

    And in any event–I don’t look to the Wine Spectator for stock tips, and I don’t look to The Wall Street Journal for wine guidance, the estimable John & Dorothy notwithstanding.


  5. Wine ratings can be a vague guide sometimes but are in the end, completely subjective, and when being used comparatively while shopping at retail, largely a big giggle. On current retail shelves there really fabulous 87- 87 point cabs and merlots from amazingly well-respected producers from great vintages costing $55-$79.99…. and on the next shelf section, 90 point Chilean cabs or Argentinian Malbecs costing $18.99 on sale for $12.99 ? The ratings come from the same wine personality/tasters within the same 12 month period. Why do some of these supposedly lesser wines taste almost as good or in some cases actually better ? Ratings in wines can only ever be a vague guide.
    Retailers need better sources of information (which they will never use because they own stock) and a more universally understandable consumer system has to be created and agreed by producers who compete with each other to sell their wines… meaning it will never happen !


  6. I have been drinking and being taught about wine for 20+ years and have been to France and Bordeaux too many times. I have been shown and taught many things by the French experts and professional and the most important thing they taught me was taste it for yourself. Never go by what you are told, every tongue different even among what they called the snobs and sell outs in France. I have found so many $15 to $30 Wines that rival even the biggest, and I wont even mention those outside of France. As we say here, there is 90% hype to the large and historic wineries, not to take away from them but there are far more better or the equivalent of those wines for Hundreds if not thousands less. A simple class or instruction can help you not be taken by Parker or other biased critics or those chateaus who spend millions influencing those who try to steer you in their direction, Once you are slightly educated about wine the joke is on them. Chin Chin!!


  7. Actually, doing a statistical analysis of Cellartracker reviews would be quite interesting. Perhaps I should ask one of my programmer friends to hack together a Firefox plugin to do the heavy lifting for us — it’s not like calculating SD is all that hard! “90 points with an SD of 3″ is a LOT more meaningful than just “90 points.”


  8. Wine Mule; loved your comment at the end.
    Perfect advice.


  9. The problem, as I see it, is not the rating system per se; it’s the conflation of the nature of tasting with what is, essentially, a marketing system (for wine and critic alike).

    This article was parroted, at least in theory, by a blurb in WIRED about Parker’s inability to accurately identify wines (blind, natch) he’d previsouly rated. This piece, in particular, is gotcha journalism at its worst.

    Wine is tough enough. We’d all be better off if just one writer would divorce the practice of tasting — a pursuit of knowledge and its continuance — from ratings. It is one thing to celebrate diversity and/or complexity. It’s another to pile on obfuscation, which clearly benefits no one.


  10. Michael–

    I hope I am misunderstanding you. Are you suggesting that tasting notes that recognize diversity and sing the praises of wines that have unique and enjoyable characteristcs lose their value if a rating scale is added?

    If so, I would suggest to you that you use the rating only as shorthand for the extensive description and realize that it is the description that enables the rating and that the rating is absolutely useless absent a good description.


  11. Cellar Tracker and Standard Deviation. Putting aside the problem of brand influence on scoring, which is real–CT would be much more helpful if it had a section on blind tastings–you can easily calculate standard deviation through this website: http://easycalculation.com/statistics/standard-deviation.php

    I ran

    Ramey 2006 Chardonnay Hyde Vineyard which received 90,94,92,93,92,91,92,94,95,94,88,95,93,96,91,95

    Results:
    Total Numbers: 16
    Mean (Average): 92.8125
    Standard deviation: 2.136
    Variance(Standard deviation): 4.5625
    Population Standard deviation: 2.06817
    Variance(Population Standard deviation): 4.27734


  12. So rounding off, an “Enthusiasts Choice” score would be 91-95, yes? Or as Alder Yarrow might put it: White Wines Scoring Between 9.0 and 9.5, which is exactly how he scored the Ramey at the Wine & Spirits tasting of the Top 100 Wines of the Year.


  13. In my next generation user-interface I hope to make the score clickable in the wine detail page and show the standard deviation. I already show the mean and the median.

    Thanks,
    Eric
    -CellarTracker.com


  14. I think, if we’re going to be honest about ratings, that at least 98% of the time people in the industry can tell a great wine from a good one and a good one from an average one etc.

    Like anything there are going to be outliers. As wine drinkers it is a constant process because each vintage and even each bottle is slightly different. To me, that’s part of the joy of wine….if you’re used to beer that is the same every single time….that’s a hard concept to understand for some people I think.


  15. @Benito – Um, yeah, maybe there is an expert or two who actually has some experience and knows something!

    @Michael – got a link for the Wired piece?

    @Eric – Thanks for the info. I had really only used the search results page to skim the info there on wines and hadn’t clicked through to a specific wine page, where now I see the median. Thanks, that’s helpful.

    On another note, have you thought about having users rate other users as happens on rottentomatoes.com? I know that users can list their favorite users and be on a list of other people’s favorites but am curious if you plan on developing that feature more.

    I, for one, am looking forward to seeing the new user interface up and operational!


  16. Dr. Vino,

    In the next-gen UI there is the ability for people to indicate whether a note was helpful or not, and from that I will be able to extrapolate a rating for each rater. I am not yet sure how overt I will be with such aggregate ratings, as I don’t want people to feel judged when there is already a fair amount of ‘stage fright’ involved with posting a tasting note. First and foremost, if a note or the process of writing a note adds value for the authors themselves, that is more important in my mind. And I think most of the time the note will then prove useful for others.

    Trust me, I am looking forward to having the new interface up and running. Still grinding along with 110 hour weeks trying to get it done this year. It is coming together very well.

    BTW, as an aside, thank you for all that you do. Someone noted yesterday that you are turning into the true Nader of the wine industry, breaking stories of importance to consumers left and right.


  17. Thanks to the WSJ and Dr Vino for putting forth real topics that affect us in the wine world. It’s been known for years that rating systems are inherently flawed, mostly due to human beings and our inability to consistently score/rank/rate through a process of elimination. That’s partially why we as a species have been so successful at breeding and evolving.

    I can’t help but chuckle when I go into a tasting room or tasting booth where folks have adorning shiny medals and fancy scores covering their table when they should be engaging the taster with FACTS… Facts like where the vineyard(s) are from. Orientation to sun, trellis, water availability, farming practices- how often do they drop leaves, do they cluster thin at verasion, etc…

    There are some cellar facts that might be relevant too.

    These are the items we should be discussing not the rating of a handful of people who’ve (probably) never worked a day in a vineyard or worked a day racking unfinished wine.


  18. Eric –

    Wow, 110 hrs/week? Congratulations to you on your dedication to the site and keeping it free to users. But when it comes to site design, have you checked because maybe “there’s an app for that.” ;-)

    Thanks for the kind words!

    Cheers,

    Tyler


  19. Tyler,

    I am not proud of my work schedule right now, but it is what I need to do.

    The actual site design has been a yearlong process with a wonderful design firm (Fellswoop) and a great branding guy (YiuStudios). The ‘plan’ on paper is amazing, now I am just in the midst of breathing life into it with lots and lots of code.

    Cheers as well,
    -Eric


  20. Consider this: I entered the Ludwig 2008 Gewurztraminer in three competitions and it won three gold medals, including a double gold at the San Francisco Wine Competition and a four star gold at the Orange County Wine Competition. Possibly wine tasters have a clear and precise characterization for good Gewurztraminer; they have reached a consensus on proper attributes and style and that leads to more consistent tasting results. With the global proliferation of the popular grape varietals, winemakers have had to create every style variation imaginable, including creating wines some might considered flawed, to differentiate themselves in wine society. Obviously not every critic, professional or nonprofessional, will champion all styles and there’s no reason to expect consistent ratings when certain wines are broadly grouped by varietal regardless of style.


  21. I THINK I WILL DEVELOP A WEBSITE FOR USERS TO RATE WINES. I OWN AN INTERNET COMPANY IT CANT BE TO HARD TO DO THAT, THANKS FOR THE IDEA.


  22. @Dr. Vino, all: My mistake! It was not in WIRED, but written by a WIRED contributor, Jonah Lehrer. The piece is here: http://scienceblogs.com/cortex/2009/10/robert_parker.php

    @Charlie: Mine was not a personal confusion of how to read a rating, but a comment on a too-often conflation of the method/nature of tasting with the rating of a wine for the express purpose of selling it (and the rating “expert” as a by-product).

    It’s squares and rectangles, after a fashion. Some tasting practices can be for the purpose of rating a wine, but that is a portion of all structured tasting which bear no such intent. For example, tasting for instructional/educational purposes. Or for assessing viability with a menu or within a larger wine program. Or out of sheer anal-retentiveness.

    Pace Lehrer, tasting is doggedly subjective, as science shows. That certainly chips away at the foundation of ratings and rating-givers. But neither is tasting wholly without objectivity. So the argument is not dispositive of wine assessment more broadly. But if and when, as in these two pieces, the distinction between “tasting” and ratings becomes so blurred as to appear part of the same thing, I believe many readers are losing out on the bigger, more nuanced picture (even if it means a good “gotcha!” on Tannin-Pants Parker).


  23. @michael. Your elaboration of the distinction you’ve drawn still seems murky. We are talking about ratings that emerge from tastings. Sure there is a marketing dimension to such activity, but it is not part of the evaluation, it uses the evaluation.

    What makes the “wisdom of crowds”(crowdsourcing) so compelling is the aggregation of various subjective views on a product or service. Yelp.com and Tripadvisor.com, like digg, American Idol and CellarTracker.com, produce a average rating based on the opinions of all sorts of different people each subject to different variables. With a large enough population of opinions, however, like polling, you get a more objective reading of the merits of something.

    I recommend that readers surf over to http://www.amazon.com/Wisdom-Crowds-Collective-Economies-Societies/dp/0385503865/ and read the comments about James Surowiedki’s “The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations” (Amazon.com user reviews are another solid example of this phenomenon). To hear the author on the subject go to: http://wamu.org/programs/dr/04/07/07.php and http://itc.conversationsnetwork.org/shows/detail468.html


  24. FWIW, I have read Surowiecki and think he is brilliant. That said, if you look closely at the Wisdom of Crowds, you realize that any current community rating system for wine is tainted by the professional reviewers–many of the amateur raters know that Parker rated a wine highly etc. So that bias creates issues.

    Nonetheless, when trying to figure out whether to drink something from my cellar, do I rely on a 15 year old from Parker or the 50 notes from fellow wine lovers who have had the same wine in the past year? Pretty easy decision…

    I have always maintained that both professional and amateurs can add a lot of value in helping people to recommend wines to buy and when to drink them. As so many so, your own palate is always your best guide.

    Thanks,
    -Eric
    CellarTracker.com


  25. @Eric. Yes this influence is real, as I noted above, but not very significant,IMO. Wine enthusiasts, and consumers of other products for that matter, also have offsetting variable operating: a desire to disagree with the experts. Still, you do need to encourage blind tastings or have a CT section on such.

    -tom merle


  26. @epicuria: Perhaps I am not explaining myself well. I understand your point, agree with it … although it is wholly besides mine. My only contention is with the journalists who conflate tastings which, as point out, may lead to ratings with tasting in general, which is obviously a much broader category. To me, that confuses the issue and devalues the nature of tasting by associating it — in toto — with any ratings system. (Perhaps an aside to the central issue of ratings per se, but this is a blog, right?!)

    And I’m with you on complex collaborative systems, having built them myself. While crowdsourcing can be a powerful tool, I believe it’s also prone to gaming and source-trust issues (e.g. the variable you mention). In other words, the success of such systems depends in no small part on the verficiation methods and formal trust model those systems implement to ward off the nutters. And even that doesn’t always work.

    But I do agree with you in spirit as I’m working on such a system right now in re: digital mapping. Another really, really good read is Clay Shirky’s “Here Comes Everybody”. Enjoy!


  27. @michael. I did miss your point first time around. Yes, there should be more tastings that demonstrate differences among varieties as such or as complements to various dishes, regardless of where the wine being used in these ways appeals compared to others in its class or to any other wine(s).

    P.S. On wines that go better with various kinds of foods, I learned today that the reason red wines don’t seem suitable for white fish has not to do so much with tannins, but with the amount of iron in various wines, and for the most part, red wines have more of this chemical element than white wines.


  28. What is amazing is that under the Australian wine judging system based on 20 point scores I have been amazed at the consistency of scores across judges, me being one of them for many years. These are wine judges who sit around after each round of wines, often 100 plus wines, and give their scores in a system which does not allow for much fudging. It is often common to see no more than a point difference between the main judges and their 20 point ratings. So what is going on here? To my mind holding constant the day, the tasting conditions and to some extent the expertise of the tasters will give you consistent scores. The variability in these articles you have been mentioning is surely the situation, the environment, the glassware, etc and not the wine. Science is all about contraining some, indeed most variables and relaxing others. It would seem to me only a single event tasting meets that criteria. What makes Robert Parker so successful is his “single event” scores (e.g. rating a single vintage of Bordeaux releases) are frankly remarkably “accurate” for those who have seen these recommendations mature over many years. I hope his effusive ratings for 2008 Bordeaux turn out to be correct (Robert tell me its true!!). Why would anyone expect a second round of tasting some many months later, under different conditions, under a different tasting environment, almost certainly differing glassware, would produce different scores? And all of those throwing around statistics should go back to Stats 1. What Standard Deviation means is that the chances of any tasting being consistent, meaning within the “real” rating if we knew what that was, it is 19 in 20 times likely within plus or minus 2 standard deviations o be correct. So a wine with a 90 score and a 2 point SD means anywhere between 86 and 94. In this range almost every wine ever rated falls into the point range. Applying statistics is highly misleading for the reason I mention – this ain’t science, cannot be science unless we apply many more controls! And hey let’s not forget the main point – is the wine enjoyable without points and SD’s? I just had a Cahors wine tonight called Ch Lazerrote. Just lovely 14.5 % alocohol, but seamless and cool fresh acidity making this dance on the tongue. Did anyone notice – no point scores? And I am a statistician.


  29. All of this talk is so “sciency,” and much of it above my head, but I’ve been enriched by the discussion. I’m not a fan of rating systems, mostly because so much good wine gets overlooked by consumers if it doesn’t have a point score attached, which is really unfortunate. I work in a gourmet food and wine retail shop, where, luckily, those of us who love and try many wines will try to steer customers to great, but unrated, values. But then there are those customers who only shop for wine by points, missing out on a whole world of great wine. Oh well, what’dya gonna do?


  30. [...] “WSJ: wine-rating system is badly flawed“ Permalink | Comments (1) | SHARETHIS.addEntry({ title: "“Hocus-pocus” – [...]


  31. [...] recently, DrVino.com acknowledges that the wine system is badly flawed with a link to the Wall Street Journal article, “A Hint of Hype, A Taste of Illusion” by [...]


  32. Dad take a look at this.


  33. @epicuria we can also use this standard deviation calculator http://ncalculators.com/statistics/mean-standard-deviation-calculator.htm


winepoliticsamz

Wine Maps


Classes

My next NYU wine classes: NYU

Recent Comments

Recent Posts

See my op-eds in the NYT
"Drink Outside the Box"
"Red, White, and Green"

Highlights

Monthly Archives

Categories


Blog posts via email


@drvino








Wine industry jobs

quotes

One of the “fresh voices taking wine journalism in new and important directions.” -World of Fine Wine

“His reporting over the past six months has had seismic consequences, which is a hell of an accomplishment for a blog.” -Forbes.com

"News of such activities, reported last month on a wine blog called Dr. Vino, have captivated wine enthusiasts and triggered a fierce online debate…" The Wall Street Journal

"...well-written, well-researched, calm and, dare we use the word, sober." -Dorothy Gaiter & John Brecher, WSJ

jbf07James Beard Foundation awards

Saveur, best drinks blog, finalist 2012.

Winner, Best Wine Blog

One of the "seven best wine blogs." Food & Wine,

One of the three best wine blogs, Fast Company

See more media...

ayow150buy

Wine books on Amazon: