Thursday, August 13, 2015

Reporting on Science News

Wow!  Who would have thought the first year of grad school, with its classes and teaching and exams and research and picking an adviser and deciding what to do with the rest of your life, would take up so much time?  As a result of this minor interruption, you'll have to forgive me for referring to what is now a year old Skeptics' Guide episode, but it sparked an idea that I believe is still important.

The episode in question mentions a potentially misleading science news article about the extinction of the dinosaurs.  In the episode, Dr. Steven Novella argues that the focus in the news article brings readers to the opposite conclusion about the dinosaur extinction than the original research article intended.  The research article looks into new data from around the time of the dinosaur extinction and finds that although there are some indications that there was a small drop in diversity in some types of herbivores in North America, it is still overwhelmingly likely that the Chicxulub impact is responsible for the mass extinction of dinosaurs.  The news article emphasizes the idea that the diversity decrease made the entire dinosaur population more vulnerable to extinction and if the impact had come at a different time, the dinosaurs might have avoided extinction.  Although the second half of the article moves back toward the original conclusion of the original work, it gives an inordinate amount of attention to a small suggestion brought up in the paper and doesn't look at the overall scientific consensus.

In general, I've found that scientists' views on science news reporting are reserved at best.  Unlike opinion-based topics, science tends to build up enough evidence to support one side or another.  While new information can be uncovered, and sometimes we don't have enough information to make a decision, there is still often a logical correct and incorrect conclusion.  This aspect of science, combined with journalists' tendency to portray both sides of a story as if they are equally valid regardless of known evidence, leads to many misleading articles and creates a phenomenon widespread enough to have its own name: "false balance."

But how widespread is false balance actually?  It doesn't seem fair to immediately dismiss all of science news as having poor quality without looking into it. Science writing is essential after all.  Science writers bridge the gap between scientists and the public.  Most people don't have the time or the background knowledge or a good reason to sift through narrowly specific scientific journals to find out what scientific and technological advances have recently occurred.  I can barely drag myself to read the papers my adviser sends me, and it takes a few months to get acquainted with the terminology of a new field.  Scientists still aren't off the hook when it comes to science communication, but science writers can help translate for the public.  They can shed light on important ideas.  They can inspire and excite and teach!  But they need to be doing it well.

It would be illuminating to investigate the quality of scientific writing across various media outlets using a sort of "grading rubric".  The following set of guidelines is my attempt at objectively evaluating science articles for good science.  To clarify, I'm trained as a scientist, not a writer, so these criteria focus on the quality of the science and how it's portrayed, not the quality of the writing.  Obviously there are restrictions about what fits in a column, but good science should make the cut.

We assume all articles start with a perfect score of "A" and deduct grade points as necessary.  Let's begin with the headline:
  • Headline -- One-third of a grade point is subtracted for each of the following:
    • Contains the word(s) "unexplainable", "mind-boggling", "baffling", "miraculous", "holy grail", and/or "boffin".
    • Is irrelevant to the topic to be discussed.
I gave the headlines less significance than the bulk of the article by limiting magnitude of deductions to a third of a grade point.  Now let's look at the bulk:
  • Individual claims -- The rating drops by a full grade point for each false or inaccurate statement claimed by the article.
I had some difficulty deciding how to apply the "individual claims" guidelines due to the potential for articles to vary greatly in length and mistakes in magnitude.  Rather than attempt to create a spectrum of inaccuracy, I decided to stick with a binary system where a statement is either factual or it isn't.  This also reduces any bias that might come from people being more protective and picky about facts from their particular field.  Of course, necessary simplifications are considered different from factual errors and do not warrant deductions.  For example, saying an electron is a point particle instead of discussing wave-particle duality would be considered a simplification and not incorrect in most contexts.

We also need to look at the article as a whole and not only in terms of individual claims.  For that I include criteria on contextualization:
  • Context -- Statements and conclusions in the article need to be tempered by context.  What is the scientific consensus?  Why have scientists come to that conclusion?  If false balance is present in the article then a grade point will be deducted.  For example, if the article spends equal time presenting the views of scientists who claim that climate change is real and those of scientists who claim it is not real when, in reality, 99% of scientists would agree that climate change is happening, the writer will be penalized.
  • Conclusions -- An automatic failure is given to any article that leaves the reader
    with an overall impression or conclusion that is opposite of the
    conclusion reached in the original scientific paper.
All that I'm asking for is science articles that are factual and appropriately contextualized and not sensationalized to the point of being misleading, and I'm curious to see what the current state of science writing actually is.  That's why I need your help.  My guidelines need to be tested to see if they are reasonable.  Are they objective and fair?  Does the same article receive the same grade regardless of the person reading it?  Is the final criterion too subjective and too powerful?  I am happy to see The New York Times volunteer to be my first test subject by virtue of me noticing their article on social octopuses under the Google News science section, and their having a publicly accessible link to their main original paper reference.  Let's see how they, and my criteria, hold up.

We will need many, many data points before we can tell which media outlets are trustworthy and which need better quality control when it comes to science news.  If you take the time to test a news article and compare it to the press release or original paper (or both), please let me know! Just list the journalist, the journal, the titles of the news article and the comparison (a research paper or a press release), your grade, and some remarks about how you used the guidelines so we can work out any bugs in the comments section below.  And I hope my older posts can pass my own test!

Thursday, August 21, 2014

Measuring Boston's Green Line



I moved from LA to Boston about a year ago.  Since I hate driving, I was looking forward to moving somewhere with a more effective (read: existing) public transportation system.  At least, I was looking forward to it until I realized my apartment was situated on one of the green lines of the T (I recommend the 4th and 9th reviews especially).

For those of you unfamiliar with Boston’s public transit system, known as “the T”, it consists of the red, blue, and orange lines, all of which are reasonably timely subways, and then the green line.  The green line has four different branches, most of which operate at street level and, for some reason, lack any sort of tracking or timing system.

One day at work, I mentioned my disappointment with the green line to my labmates which led to a heated debate about which of the four green lines was the worst.  I was convinced the B green line was slower than the C line while my friend, Aishu, adamantly believed that the C line was slower.  One of the senior postdocs, who had been living in Boston longer than both of us, explained that the quality of the green line branches decreased alphabetically.  The B branch was the best, followed by the C, then the D, and, finally, the E.   

In spite of his expertise, I was unconvinced that his rankings were true. In an attempt to settle the matter of the T’s green line ratings and do a bit of research into the quality of Boston’s urban planning, I got Aishu to help me with an experiment.  For about six months we recorded the amount of time we spent waiting for various outbound green line trains which is arguably as valid a use of time as playing mobile phone games while waiting for the subway.

Whenever we walked into a station that served at least the B, C, and D lines, we noted the time when we got to the platform and the times at which the trains arrived until the train we were waiting for came into the station.  We also noted the crowdedness of the train car we were in.  I should point out that because the E line shares fewer stops with the B, C, and D lines than they do with each other, we do not have much data on the E line.  Additionally, most of our data comes from weekdays between 6:00 and 8:00 pm.  It is also important to note that we did not measure the average waiting time for inbound trains or how quickly the trains traveled to their destination.  Still, we ended up with enough information to test the Massachusetts Bay Transit Authority’s claims about the average wait time for green line trains.

The first thing we looked at after our recording our six months’ worth of data was the frequency of the trains.  How many trains would you have to watch go by before you finally got the one you wanted?   While not necessarily as informative as the actual waiting time, this factor still has some role in determining how irritating each green line is.  As you wait on the platform, your hopes rise every time you hear the piercing shriek of a green line train coming into the station just for it to be the wrong branch.  Repeat this experience of having your hopes dashed several times in a row, even if interval between incorrect trains is just one minute, and it’s easy to see why train frequency matters.

Our data is summarized in the figure below: 


Of course there are fewer instances of trains being the 5th or 6th train to go by since we stopped recording once we got on the train we were waiting for.  I was also usually waiting for the B line and recorded data more often so there are more instances of B line trains while Aishu was usually waiting for the C line.  Excluding these results, we also find that when an outbound D line train arrives in a shared station it is usually the first train to come by.  Of the 35 recorded D trains, 42% were noted to be the first train of a sequence.  This is compared with 28% of B line trains being the first to arrive, and 30% of C line trains.  In the majority of outbound trips, it seems you will only have to wait for one train to go by before the one you want arrives.

But knowing how many trains you’ll have to count before you can get on one doesn’t tell you much about how long you’ll actually have to wait.  To determine that, we present the following data:

Note: The 12 and 14 minute values are excluded due to formatting reasons and because no trains were recorded at those times.
86% of outbound B trains arrive in five minutes or fewer, as do 86% of outbound D trains; however, only 59% of outbound C trains will require you to wait fewer than five minutes.  This is true despite the fact that the single most common waiting time for a C train is 0 minutes.  1 in 4 times when you step into a station, a C train will be waiting there for you, but if you miss it you might have to wait for a while.  The longest wait time recorded was for a C train at 16 minutes, but on average you’re only going to have to wait for about five minutes (see below).

The average wait times for the other trains are shown in the following figure:


We have averages of slightly over three minutes for the B line and slightly under three minutes for the D line.  The E line appears to be on par with the C line, at around 4.5 minutes, but keep in mind we only recorded seven points of data for E line trains so the accuracy of that value is not without error.

According to the MBTA, B line commuters can expect to wait 10 minutes for a train in the evening (6:30 pm to 8:00 pm), C line commuters – seven minutes, D line commuters – 10 minutes, and E line commuters – 10 minutes.  While the average waiting times for all the lines are below the MBTA estimates, remember that the MBTA is not distinguishing between outbound and inbound trains and may list longer than expected times to be safe.  Nevertheless, it is somewhat reassuring to know that when waiting for an outbound train, you will probably be on your way in under five minutes.  Still I’m sure Bostonians would appreciate it if real-time updates were made available.

So you’ve waited your average 3.8 minutes, caught your train, and are on your way home.  How much space will you get to yourself on the ever luxurious green line?  To take into account the amount of discomfort one might have to endure on their commute, Aishu and I came up with a crowdedness scale.  It ranges from 1 – “spacious”, meaning “Wow, I practically have this train to myself!  Actually that’s kind of weird…” to 7 – “too full to get on” which I expect I don’t need to define.  In between we have 2 – “not full and not empty” indicating that there are a few people sitting, 3 – “full/normal” meaning the seats are all full and there are some people standing, but there is enough space to move around, 4 – “crowded”, a level at which you might have to stand closer to other passengers than desired, but you can still breathe, 5 – “very crowded” meaning “I’m starting to get ever so slightly uncomfortable.  I really hope nobody tries to get on at the next stop”, and 6 – “stupidly crowded” at which point you find yourself thinking “How did more people fit onto this train?  If I don’t get out of here soon I’m going to start biting everybody who’s touching me!”

As we primarily took the B and C trains, we only have crowdedness data for those two lines.  The results are as follows:



The average crowdedness of both lines were around 3 – “full/normal” although the C line tended to be slightly less crowded than the B line.  The only recorded instances of a train being too full to get on to (level 7) were with B line trains.  Never fear though, you are likely to find space on your train on either line.

Overall, it looks as if reality is actually better than what the MBTA predicts, at least for outbound trains.  On average, you will wait for less time than the MBTA website claims and you’ll have space in your train car, if not a seat.  Counter to our postdoc’s beliefs, quality of the green line branches doesn’t decrease with alphabetical order, as shown by the D line’s shortest average waiting time.  It is important to note, though, that without data on how much time you will actually spend on the train, it is still difficult to conclude whether the green line is worth using.  Considering the average speed (4.1 miles in 24:49?) of a green line train and how small Boston is, might I recommend a bicycle instead?