![]() This is not the same as data-dredging or p-hacking, partly through intent (the GoFP is typically well-meaning) and partly because you may not run more than one analysis. But visualizing before deciding on the analysis leads you into Gelman and Loken's garden of forking paths. How does someone avoid data dredging with this vague task in mind? Create hold out sets for testing data? Does visualization "count" as snooping for an opportunity to test a hypothesis suggested by the data?īriefly disagreeing with/giving a counterpoint to answer: yes, visualizing your data is essential. But often one has to work with data sets given to us, and are told to "look for patterns". Naturally, it is not data-dredging if the data set were collected with the intention of testing this hypothesis from the get-go. Is it still data dredging if I thought: "Hm, I bet the higher quality houses cost more, since I am a human that has lived in a house before. Now, suppose that I had nothing in mind for testing this hypothesis until I plotted the data. Then, I get a p-value that appears to correctly reject the null hypothesis that there is no difference in means. There appears to be a difference in center of location! Why don't I do a t-test on the means?". It is clear that there is a difference in the center of location for the medium and high quality houses. We now have a distribution of the sale prices for each of the three groups. ![]() Here, "low" is $\leq 3$, and "high" is $>7$ on the "quality" score. Then, using these groupings, I can plot histograms of the sale price against each other. I can separate the data into "low", "medium" and "high" quality houses by (arbitrarily) creating cutoffs for the quality. ![]() Here, we have a "quality" variable, from 1 to 10, and the sale price. Suppose I have a data set, such as the boston housing price data set, in which I have continuous and categorical variables. ![]() I'll propose this question by means of an example. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |