RSS

Tag Archives: statistics

The State of Science Education

Spotting-Bad-Science-v2.pngI’m not sure how I came to be reading this article, especially strange because the byline states that it was published over a year ago in The American Spectator. I expect it was mindlessly following some click-bait on Yahoo that brought me there, but what I found was the tragic remnants of a mind denied a proper education in scientific method, logic, and mathematics.

Emily Zanotti wrote up her impressions of a scientific study she had uncovered in an article titled, “Study Finds John Kerry Worst Secretary of State in the History of Ever. John Hayward wrote a similar piece for Breitbart the same day, as did the Washington Post, under the slightly less scathing headline, “Scholars votes put Kerry last in terms of effectiveness.” So, why focus on this minor publication’s reporting over more mainstream outlets? I don’t have much reason other than the fact that I found the article there first and the visceral nature of the title held my attention best (remember, I found it by following click-bait while trying to find a reasonable source of right wing news).

I really don’t care (at least for the purposes of this discussion) one bit about the actual question, but would rather focus on how these data were interpreted for the popular press (using Ms. Zanotti’s article as my example).

First, the data:

The articles I found pointed to Foreign Policy Magazine as the source, the actual data in the rawest form I could find can be found here.

Who was polled? (because this is an opinion survey):The poll was sent to International Relations (IR) faculty from colleges and universities around the country. Responses were received from 1,615 IR scholars drawn from 1,375 U.S. institutions.

The question about Secretaries of State was one of many, and was phrased as, “Who was the most effective U.S. Secretary of State in the past 50 years? ” We’re told that the number of responses to this particular question was 655. I think it’s rather strange that only half of the respondents answered this question, especially given that one of the most popular answers was ‘I don’t know’ receiving 18.32% of the vote, or 120 ballots.

The Results were reported as:

Screen Shot 2016-04-17 at 4.19.07 PM.pngIf you count them up, there were 13 distinct answers given –

only 12 are people if we drop out the ‘I don’t know’..
This is interesting because the headline read that Kerry was the worst

I agree that he is last on this list, but depending on how you want to count, the number of people who sat in that office for the past 50 years was either 15 actual Secretaries of State or 28 Secretaries + acting Secretaries.

Either way you slice it, we’re missing some people from this list. A quick look at trusty Wikipedia show us who we’re missing (See below). Where is Will Rogers in the poll? Edmund Muskie? Apparently these poor souls got zero votes, so they don’t show up in a percentage-based calculation.

Screen Shot 2016-04-17 at 4.20.46 PM.png

At 0.31%, poor John Kerry received only 2 votes as the most effective Secretary of State (SoS) in the past 50 years. But Rogers and Muskie apparently got zero. It’s hard to see that this puts Kerry in last place.

But that’s not all. The question asks, “who was the most effective SoS?” which isn’t the same as asking respondents to “rank the SoSs according to effectiveness.” What would the percentages look like if two Secretaries (say, Bob and Hank) were the clear front-runners and everyone agreed on that point? Moreover, imagine that everyone also agreed that a third Secretary(Sally) clearly came in right behind the two front runners, but couldn’t compare to the undeniable efficacy of the first two. My guess would be that Bob and Hank would split the 655 votes, and no one else would get any. Not even Sally. Everyone else ties for last place.

Ms. Zanotti’s piece continues, “John Kerry is the worst Secretary of State in history according to a survey of professors at the top 25 foreign policy schools conducted by Foreign Policy Magazine, losing out, even, to ‘Don’t Know.'”

bad_science1.jpgBut no one was asked to name the worst SoS. What is true of this survey is that very few people think Kerry is the best. As for ‘Don’t know’, I imagine that these are the people who simply can’t decide between Bob and Hank, from the example above. Those two are just too close to call. Further, since ‘Don’t know’ came in second, it’s hard to say that Kerry was beaten by this answer in any meaningful way.

She continues, “Of the scholars who responded, Kerry earned exactly two votes, and came in after Lawrence Engleberger who was Secretary of State for a whole six weeks at the end of George H. W. Bush’s second term, and spent most of that time keeping the chair warm for the Clinton appointee.Which had me wondering if perhaps one of the best ways to be an effective SoS is to not have much happen, as is the case when someone is in office for only a very short time.

One last statement on the data before going to her conclusion… In reporting the results of the poll, she mentions that, “James Baker — who was actually the most effective secretary in the last 50 years” came in third at 17.7%. Wait – what? Where is this “who was actually the most effective secretary” come from? Was there another measure that we haven’t been provided with? My guess is there was not, but that this is the author’s admission that she has already determined the right answer and that no data applies to this opinion.

Ms. Zanotti then ends her essay with this strange statement, “At any rate, it’s nice to know our collective impression of Kerry’s effectiveness is objectively verified. ”

I guess she knows her audience, so the “our collective impression” probably makes sense saying, but what about the idea that Kerry’s effectiveness has now been objectively verified?

What about an opinion poll could ever result in an objective verification? I suppose she might mean that it is now verified that many people hold some opinion, but it’s hardly objective. To arrive at this conclusion, we have to accept that the sum of many (at least 655) subjective opinions is equal to an objective conclusion.

This is equivalent to saying that a poll of 400 Yahoos objectively verifies that the sum of 2 and 2 is 8. All we have is a group of opinions. They might be the opinions of very smart people who are speaking within their field, but there’s no logical necessity that they are correct.

So, to return to my title. I can only assume that this hot mess of an essay comes from a completely science deprived education. Who is the author, Emily Zanotti? I had never heard of her and had her pegged for a one-off writer who wasn’t really involved in the world, but I stand corrected. She’s apparently well known amongst the Right as an outspoken libertarian. Her twitter bio reads, “Writer, blogger, comedian, nerd. Cosplayer. Catholic. Political reporter. Resident geek . Libertarian. Opinions my own but should be yours.” Her R Street bio calls her, “a columnist for the The American Spectator  and an associate fellow of the R Street Institute. She is a ten-year veteran of political communications and online journalism based out of Chicago, where she runs her own digital media firm. Her work has appeared at her former blog, NakedDC, and across the web. She has a law degree from Ave Maria School of Law with a focus in intellectual property and technology law” (Sic).

I’d never heard of the Ave Maria School of law, so I had to look that up too. The Miami New Times  had this to say about the school:

Meanwhile, Ave Maria — founded by Domino’s Pizza magnate Tom Monaghan and relocated in 2009 from Ann Arbor, Michigan — continues to plummet, finishing dead last with a horrific 47.8 percent of students passing the [Florida Bar Exam].

However, this poor performance is apparently relatively new, with  former Ave Maria law professor, Charles Rice, stating in the same article that the school’s performance was good prior to the move. Together, these statements make it difficult to use the school as any proxy assessment of the person’s education. Regardless, her R Street bio suggests that she’s not a fool. Therefore, I’m left wondering… how did this article happen?

 

 
Leave a comment

Posted by on April 17, 2016 in Uncategorized

 

Tags: , , , , , , , ,

Epidemiology: Should farmers try to do more work near noon?

The CDC has a wealth of classroom information (case studies, discussion material) regarding epidemiology. No surprise there. It’s what they do.

In my Microbiology class we’re starting a unit on epidemiology that students are working on in their free time either alone or in groups. We will talk about the project as questions come up, but mostly, I wanted people to have an opportunity to think freely – i.e. without me forcing my own ideas on them.

In my Ecology (population genetics, etc) class, we just spent some time last week discussing how data is just data, and in the absence of a reason to mistrust it, it probably makes sense to assume that the data is correct. However, this leaves the interpretation of the data up for much debate. ‘How so?’ I was asked. ‘Because people run experiments with certain ideas in mind that they would like to support or undermine. There can be many ways to misinterpret data.’

With this in mind, I ask you…

Should farmers try doing more work near noon?

Data suggests that this is the safest time of day. Yet, anecdotally, fewer farmers are putting time in the field at this hour than any other hour of the day(8am-8pm). What’s going on?

Screen Shot 2014-04-21 at 9.51.05 PM

 
2 Comments

Posted by on April 21, 2014 in Uncategorized

 

Tags: , , , , , ,

5 Beautiful Infographics

I’ve been listening to a new podcast about big data, data mining and data visualization while running and It got me thinking about the way that data is presented.

In the lab, beautiful data means clarity and precision of results with the assumption that the observer can do the work to understand with a minimum of assistance. Here’s some cell proliferation data using CFSE (a dye that stains cells and is diluted every time a cell divides):

Image

published in Nature by Dawkins et al 2007

Whereas, outside of the laboratory, data is best presented in a way that clearly expresses the message with the least possible explicit explanation. I collected five info graphics from the web that I thought accomplished this goal the best (and presented data I was at least partially interested in)

A beautiful visualization of population density across the United States down by Time magazine:

Image

The legend serves only to translate scale into actual numbers, but the meaning is clear enough without actually needing it at all.

Here Forbes media shows what source of media predominates in each of the United States:

Image

 

This media info graphic does the best to illustrate that sometimes the information used to create an info graphic is close to worthless, but can still make a compelling presentation. As such, this probably represents the best argument against these presentations. “Is this information really worth knowing?” -or- “Is this really information at all?”

 

 

Mmmm Coffee. I certainly do love coffee…

Image

 

Some data that’s a little more serious: The good work of vaccines is invisible. It’s very hard to wake up and look out on the world and think, I sure am glad so many people are not getting sick from vaccine-preventable illness. Here’s a way to actually see that:

 

Image

Again, from Forbes

Lastly, how does your level of education correlate with salary and your chance of being unemployed? These are numbers that perhaps every parent should consider when talking to their child about their educational goals.

Image

I can’t say that I’m receiving that lest benefit currently, but perhaps I should consider it motivating.

 
Leave a comment

Posted by on November 17, 2013 in Uncategorized

 

Tags: , , , , ,

Experimental Flaws -Uncontrolled variables

I had an interesting text message from my cousin today. He was asking, ‘What is meant when a  study is deemed to be flawed due to uncontrolled variables? i.e. what does it really Imagemean to have uncontrolled variables?’

It’s an excellent question – and one that is well addressed in a book I recently recommended here called How to Lie With Statistics.

I gave him the following answer:

‘A simple example might be someone looking back through historical data and seeing that the number of cancer cases (of all kinds) has been on the sire over the past twenty years. In terms of absolute numbers, this is true. Some people use this to raise the alarm that we have to get more aggressive in our fight against cancer because it has become a leading killer. Perhaps that’s not a bad idea either, but if someone were to look more closely at the details they would quickly see that these absolute numbers aren’t the right data to make this conclusion by. There are uncontrolled variables.

here’s some real data:

The unaltered or crude cancer death rate per 100,000 US population for the year 1970 is 162.8. Multiply this rate by the US population of that year, 203,302,031 and divide by 100,000, we obtained the total cancer deaths of that year, 330,972. Divide this number by the number of days in a year, we obtain the average number of Americans who died of cancer in 1970 at 907.

Twenty years later, the unaltered cancer death rate for the year 1990 is 505,322, the total population, 248,709,873. The cancer death rate per 100,000 population rose to 203.2. The daily cancer death rate was 1384.

(http://www.gilbertling.org/lp2.htm – original data:The 1970 cancer death rate was taken from p.208 of the Universal Almanac, John W.Wright, Ed., Andrews and McMeel, Kansas City and New York. The estimated 1996 cancer deaths figure was taken fromTable 2 in “Cancer Statistics” by S.L. Parker et al, in CA, Cancer Journal for Clinicians, Vol. 65, pp. 5-27, 1996.The 1970 US population was taken from the World Almanac and Book of Facts, 1993, p. 367; the estimated 1996 population was from the 1997 edition of the World Almanac and Book of Facts, p.382. The 1997 total cancer death figure was obtained from S.H. Landis et al in CA, Cancere Journal for Clinicians, Vol. 48, pp.6-30, 1998, Table 2. The US population for 1997 was obtained from The Official Statistics of the US Census Bureau released on Dec, 24, 1997)
ImageHowever, if this is the limit of the analysis, it’s useless. In 1970 the life expectancy was about 67 years for a white, non-hispanic male, while in 1990 that number was about 74.
Since cancer is a disease of the aged, it is likely that the increase in cancer is directly linked to the increase in population of the elderly.
What this means, it that in order for the study to be meaningful, the authors should look at cancer rates among a more comparable group, perhaps white, non-hispanic non-smoking males living in some certain region  that has not undergone drastic demographic changes or excessive immigration / emigration. By taking these additional steps, we reduce the number of differences in our two populations, allowing us to make a ‘more controlled comparison.’
 
Leave a comment

Posted by on September 20, 2013 in Uncategorized

 

Tags: , , , , , , , , ,

Drift

ImageIn population genetics there are two equations that allow us to estimate the frequency of alleles within a population and also to estimate the number of homozygotes vs heterozygotes for a recessive trait. These equations are known today as the Hardy-Weinberg equations because they were simultaneous proposed by two independent scientists. Like many equations, they assume a model that is not exactly reflective of the real world, however they do lend us an understanding of the rules of the system.

The two equations are:

q + p = 1

q2 +2pq +q2 = 1

It’s that easy. In each of these equations p stands for the frequency of one allele in a population and q stands for the frequency of the other allele. Assuming there are only two alleles, they must add up to 100%, represented by the decimal number 1 here.

In order to use these equations, certain conditions must be adhered to.

  1. No gene flow (immigration / emigration)
  2. No sexual selection
  3. No survival selection
  4. No mutations
  5. No genetic drift

 The last one is the one that has been interesting me lately.

 What is genetic drift? What it describes are statistical anomalies, like a run of ‘Red’ on the Roulette Wheel or an unexpectedly long string of ‘Heads’ when tossing a coin.

 What happens during genetic drift is that one allele becomes favored just because of such a statistical swing. But unlike roulette or coin tosses, when an allele loses out for a number of generations, it stands a diminishing chance of being seen again. The statistical anomaly becomes ‘hard-coded’ and self-reinforcing, such that eventually alleles disappear.

The key is that small samples allow genetic drift to happen more often, while larger populations tend to not see this occur. Using out coin toss example, if you toss a coin ten times, it is not especially surprising when you get 8 ‘heads’ and 2 ‘tails’. Whereas, in a toss of 1000 coins, getting 800 ‘heads’ is nearly inconceivable.

I encountered this while coding a genetics simulation program (note: my simulation uses a Wright-Fisher model that has distinct, non-overlapping generations).  I wrote the program and started testing it by allowing random breeding to occur over 100 generations or so. I started using only 100 animals in my simulation, but regularly saw one allele outcompete all others, meaning that the population had lost diversity.

Below is an example with 100 organisms with four alleles for the gene breeding randomly for 200 generations.

 Image

I was sure it was a problem with my algorithm. Then I started increasing the number of animals and the ‘problem’ went away.

Here’s a second experiment at the other end of the spectrum using 50,000 animals also with four alleles breeding for 200 generations. I’ve forced Excel to graph this out on the same axis.

 Image

All this, just to demonstrate to myself that the prohibition against genetic drift is actually another way of saying, “This only works with large populations.” 

What interested me is how to know whether your population is large enough to ‘resist’ genetic drift. And, how quickly will genetic drift drive alleles to fixation / loss?

“The expected number of generations for fixation to occur is proportional to the population size, such that fixation is predicted to occur much more rapidly in smaller populations.”

Not surprisingly, there is an equation designed to predict the time (# of generations) before an allele is lost by drift.

The expected time for the neutral allele to be lost through genetic drift can be calculated as

 Image

where T is the number of generations, Ne is the effective population size, and p is the initial frequency for the given allele.

(this section is informed greatly by the work of Otto and Whitlock at the University of Columbia, Vancouver. ) 

Sometimes having a computer simulation comes in handy to help get a better look at how these rules apply given different populations. I’d like to get this simulation built into a simple app for either desktop or mobile device to make public, but I have been having a lot of difficulty making the leap from a program running in the console to something worth sharing.

 
1 Comment

Posted by on August 8, 2013 in Uncategorized

 

Tags: , , , , , , , , , ,

All in a kerfuffle

I’m all bent out of sorts since I decided to write about the green coffee extract paper popularized by Dr. Oz. 

Here’s the problem: in my last post I attempted to unpack the data presented in the article describing a weight loss trial using this supplement. Yet, the closer I examined the data, the more clear it was to me that the data presented in that paper does not support any conclusions.

This does not mean that the supplement is effective or not. It doesn’t even mean that the group is lacking in data that would answer the question. It merely means that the numbers they present and the descriptions of their methods do not allow one to scrutinize the data in a way that supports or refutes their claims.

ImageFor anyone interested in a fun discussion of statistics and what they mean, I strongly recommend the classic text, How to Lie with Statistics, by Darrell Huff.It’s a bit out of date, but still a lot of fun to read and educational for those who have not spent much time analyzing figures.

One thing the Mr. Huff’s book does well is brings the reader into the discussion of data and how to present it. A lot of his focus is on how advertisers manipulate their graphs and language in order to obfuscate the truth.

I don’t think this coffee extract paper is intentionally obfuscating the truth, rather, I think the confusion comes from an inability of the authors to present their data clearly (even to themselves perhaps). I’ve worked in a number of labs with a number of scientists in my life and I can say with conviction that not all scientists ability to analyze their data is the equal. In fact, I have seen a number of presentations where the presenter clearly did not understand the results of their own experiments. I can say that sometimes I have not understood my own data until presenting it before others allowed us to analyze it together (i.e. I am not exempt from this error).

I would love to have the opportunity to examine the raw data from these experiments to determine if they really do address the question – and whether, once addressed, the question is answered. I’m going to appeal to both the journal and the authors for more clarification on this and will report my findings here. 

 

 
Leave a comment

Posted by on July 23, 2013 in Uncategorized

 

Tags: , , , , , , , , , , ,

Data is slippery stuff

FallOnIce-738120My students know that I can go on and on endlessly about vaccines and immunology – and I also publish here on the same topic. It only makes sense, I may be an intro bio teacher most days, but I’ve spent most of my life working in immunology, including getting my degree in that field.

However, it’s not the only thing I harp on. For instance, I want my students to examine data they are given and think about what that data means. Data is just data, i.e. numbers. If I told you “5652“, it would be meaningless, but it becomes meaningful when units are applied and you know what those units truly stand for. That particular number would seem high if I said it was the number of dollars a hamburger at Five Guys cost (ps – their burgers as terrific, just maybe not $5,652 terrific). It would seem a low number if I told you that was how many people lived in NYC (Google tells me that real number is about 8,244,910).

Image

What Do I Mean?

So, here’s some real data I was directed to this morning (whilst in my convalescence):

Make sure you look closely at it and interpret the data just from the information given (the article it references discusses the data broadly, but does not tell you anything more about it).

Click on the chart and see the article it refers to. What is that article TITLED?

I’m not going to write any more just now, but I do intend to return to this in a couple of hours (pending any comments)

 
Leave a comment

Posted by on April 20, 2013 in Education

 

Tags: , , , , , , ,

Vaccinated, chapter 10: An Uncertain Future

ImageA lot has been made of the putative link between vaccines and autism since the 1998 publication of ‘Illeal-lymphoid-nodular, non-specific colitis and pervasive developmental disorder in children‘ by Andrew Wakefield. In chapter 10 of Vaccinated, we will be discussing how this article started an epidemic of fear amongst the parents of young children that had a drastic effect on public health.

The author quotes Philip Roth’s The Human Stain as a testimony to the power of suggestion. “To hear the allegation is to believe it. No motive for the perpetrator is necessary, no logic or rationale is required. Only a label is required. The label is the motive. The label is the evidence. The label is the logic.”

In our discussion of this paper we will review the circumstances surrounding its publication, the data presented by the paper, what they mean and don’t mean and what efforts have been made to research the possibility of such a causal link.

A good review of the work done to examine the evidence for any such connections can be found here. We will review the data presented in this article and since that article is a secondary source, we will uncover some of the primary data that this review discusses as see how it compares to the original paper.

In putting this together, I am reminded of an interesting TED talk about how science is done and the necessity for critical examination of data. The talk, by Margaret Heffernan: Dare to disagree is well worth the listen:

 

 
Leave a comment

Posted by on April 2, 2013 in Uncategorized

 

Tags: , , , , , , ,

Data proving that my dice rolling is not bewitched

My wife has been convinced that I roll a preternatural number of doubles when we play backgammon. So, I’ve been recording the data for the past couple dozen games we played and I can now say, with the full force of statistics that “I am not a witch – I mean, I am not bewitched.” It is true that in these twenty odd games, my average number of doubles per game is about 0.6 higher than hers, but a student’s t test demonstrates that this is well within the predictable averages. (p = 0.27).Image

 
Leave a comment

Posted by on July 3, 2012 in Uncategorized

 

Tags: , , , , ,