Tuesday, March 4, 2014

Follow Up on Cherry Picking

In my post regarding cherry picking I wrote that purposefully excluding data from a data set is misleading and unethical, but I forgot to mention there are some exceptions.  It can be necessary to exclude a data point or points that was/were gathered outside of controlled conditions.  Imagine measuring the average growth rate of bacteria at 37 C over many trials across many days.  Several days in a row, the bacteria are plated at a known concentration in a Petri dish, incubated at 37 C overnight, and the amount of growth is measured in the morning via optical density or whatever your favorite method is (don't pretend you don't have one).  If the power goes out one night and the 37 C incubation room drops to -15 C, the data from that trial become invalid to use when calculating your average.  Incorporating the data would skew the results unfairly, as the conditions under which the data were gathered are not in line with what you are testing.  If you end up not using certain data points in your analysis, you need to explicitly state why you are excluding them.

Throwing out half your data from a long term study after you realize you're not going to get the results you want is still cherry picking though, regardless of your North Carolina residency status.

No comments:

Post a Comment