Not until I read Goldstein’s book did I realize just how weak the correlation was in blind tastings between expert evaluations and price in experimental settings. Yet, somehow Wine Spectator, which claims to do tastings blind (at least with respect to who the producer is), has an extremely strong positive correlation between prices and ratings. Hmmm … seems a bit suspicious.I need to just read Goldstein's book, I guess. But in that previous article, presented at the Association of Wine Economists, Goldstein and the authors actually find a strong, positive correlation between price and what they called "experts." So I'm not sure what Levitt is talking about here. The negative correlation between price and preference exists only for the non-experts. But for people who have taken a class on wine, or who might otherwise be categorized as experts, even under blind taste tests, they prefer the more expensive wines (even though they don't know which those are).
To me, this is really, really simple. There's increasing returns to education and experience in the appreciation of wine. That is, there is a learning component - the more wine you drink, the more discerning your palate becomes. To take it even further, the more you drink of it, the more you prefer certain kinds of wine. That a non-random selection (or even a random selection) of Americans can't tell the difference between good wine and bad wine hardly proves the point that there is no such thing as good or bad wine. It's like asking me to rank my preference of techno music or classical music, even, when I've maybe listened to samples of each a handful of times in my entire life. Or to ask my kid to tell me if he likes kale, mashed potatos or pizza. Of course he's going to pick pizza. Does that mean pizza is "better" in some objective sense? Seems to me like Levitt's committing the ought/is fallacy. Just because something is the case doesn't mean it ought to be the case.
Update: One last thing. In that more recent paper by Goldstein and co-authors, they appear to use a robust standard error correction. But, now that I think about it - and mainly I'm thinking about something my friend, Matt, said to me once about his own experimental work - shouldn't they be clustering on the session? The errors are clearly correlated within session. The sessions were blind, yes, but they were also done in public view of one another. Surveys were also filled out at tables where people could see one another's answers. Shouldn't you cluster at the session? This would affect inference, and I'd be interested in how it changed the results. I also really wish that instead of price, he was controlling for quality rating based on Wine Spectator or Wine Advocate scorings. Price is interesting, but the real measure of a wine's quality is going to be the "expert" ratings. It'd like to see price and quality controlled for and see whether preferences follow or not.
2 comments:
Not clustering standard errors is a big red flag for me. If they didn't do it, they may have a good reason but it's weird that they don't explain that. These days you want to cluster whenever you think the errors might be correlated, and an experimental session is just such a time. In addition to the ways in which correlation might enter through participants influencing each other, there are also "common shocks" (see Manski 1993) to the groups, such as time of day, day of week, climate, variation of bottle's quality or temperature exposure, etc. That stuff will get absorbed into the error term, and those elements of the error are going to be common to the session. I think that's right anyway.
Yeah, I was thinking that same thing - really your points about that with your experimental work made me wonder why they aren't doing it. The paper is very vague on the standard error correction. Nothing even about White. Just says "robust p-values in parenthesis" which I'm assuming is just a simple robust correction, like Stata's. But even that's unclear. It's definitely nothing like a clustered correction at the level of the experiment, and if you look at the youtube on the book's website, you'll see what appears to me to be the possibility of common shocks everywhere. All of it may be blind, but it's all done publicly and together. I suspect that if nothing else, though, it would probably get rid of the inference on the "expert" dummy, as the combined significance of that interaction is only barely significant (p-value of 0.095).
Post a Comment