Tag Archives: study

An Epidemiological Method: Using RFLP to Identify Strains of Pathogens

An excellent classroom resource for a case study in epidemiology is presented by the CDC. This study walks students through an outbreak of E. coli O157:H7 in Michigan.

The purpose of this study is to provide student investigators with the opportunity to walk through the procedures and rationale behind investigating the etiology and to develop experiments testing hypotheses generated by the students.

I am using this exercise as an end-of-semester project for my microbiology students to work through collaboratively now that we have completed our discussion of Paul Offit’s Vaccinated.

The study begins:



Escherichia coli O157:H7 was first identified as a human pathogen in 1982 in the United States of America, following an outbreak of bloody diarrhea associated with contaminated hamburger meat. Sporadic infections and outbreaks have since been reported from many parts of the world, including North America, Western Europe, Australia, Asia, and Africa. Although other animals are capable of carrying and transmitting the infection, cattle are the primary reservoir for E. coli O157:H7. Implicated foods are typically those derived from cattle (e.g., beef, hamburger, raw milk); however, the infection has also been transmitted through contact with infected persons, contaminated water, and other contaminated food products.

Infection with E. coli O157:H7 is diagnosed by detecting the bacterium in the stool. Most laboratories that culture stool do not routinely test for E. coli O157:H7, but require a special request from the health care provider. Only recently has E. coli O157:H7 infection become nationally notifiable in the U.S. Outside the U.S., reporting is limited to a few but increasing number of countries.

In the last week of June 1997, the Michigan Department of Community Health (MDCH) noticed an increase in laboratory reports of E. coli O157:H7 infection. Fifty-two infections had been reported that month, compared with 18 in June of 1996. In preliminary investigations, no obvious epidemiologic linkages between the patients were found.   The increase in cases continued into July.

Students are then asked a number of introductory questions and then presented with the following problem:

Compare the DNA fingerprints in Figure 2 from seven of the Michigan E. coli O157:H7 cases. Each isolate has its own vertical lane (i.e., column). Controls appear in lanes #1, 5, and 10. Which Michigan isolates appear similar?

This question requires some background in DNA Fingerprinting (aka Restriction Fragment Length Polymorphisms, or RFLPs), which I want to take some time to explain.

As the source material states, The purpose of this test is to identify common strains of organisms through their DNA banding pattern. “Different DNA composition will result in different PFGE banding patterns. Bacteria descended from the same original parent will have virtually identical DNA and their DNA fingerprints will be indistinguishable. Identification of a cluster of isolates with the same PFGE pattern suggests that they arose from the same parent and could be from the same source. “ (emphasis mine).

The method involves two core techniques. First, DNA from the target organism must be isolated and cut with one or more restriction enzyme(s). This will create a number of DNA fragments, where the precise number and size of fragments is determined by the sequence of that organism’s DNA.

As an example, let’s imagine a 10,000 base pair (bp) chromosome that we intend to cut with the restriction enzyme, EcoRI. EcoRI recognizes and cuts double stranded DNA at a specific sequence of 6 bases.


Figure: DNA cut by the Restriction Enzyme, EcoRI. A. DNA sequence with EcoRI recognition site highlighted and cut pattern illustrated. B. Enzyme binds to DNA at the recognition site. C. DNA has been cleaved.

On average, this enzyme will cut a random sequence of DNA every 4096 bases (this can be estimated by 4 raised to the power of n, where n = the number of bases in the enzyme’s recognition sequence , or 46 = 4096 in this case.) In our example, this suggests that a 10,000 bp chromosome will have two EcoRI sites by random chance.

The circular chromosome should be cut twice by this enzyme, resulting in two fragments of DNA (see note #2, below). Let’s say the two bands are 4000 bp and 6000 bp.

We can see these two fragments by running them through agarose, which works as a molecular sieve, to separate the two fragments by size

How does this work?

DNA is a negatively charged molecule with that charge spread uniformly across the length of the fragment. Therefore, there is no difference in charge between our two fragments, except in proportion to their length. This means that as they run through the sieve, the only difference between the molecules comes from their lengths. As any sieve, smaller objects go through easier, while larger ones are held up.

ImageThe result is that the two fragments will appear as distinct bands on a gel, with the smaller fragment running farther through the agarose that the larger. (here, the smaller band at the bottom of the gel has migrated farther toward the positive electrode)

If someone new were to become infected with this bacteria, we could isolate it from them, digest the DNA and get the same banding pattern. A closely related bacteria may have one additional EcoRI site. This would result in one of the two bands being cut into two smaller fragments, meaning that the two strains could be easily distinguished.

Back to the question posed above…

Given this, examine the following compilation of samples. Controls appear in lanes #1, 5, and 10. Which of the remaining isolates appear similar?



  1. Restriction Enzyme or Restriction Endonuclease– an enzyme that can recognize and cut DNA.
  2. Recognition Sequence – the sequence of bases that a restriction enzyme recognizes and binds to.



  1. In my example, we are using the restriction enzyme, EcoRI, to cut DNA from E. coli. As the name suggests, EcoRI actually derives from E.coli, where it functions as a defence against invading DNA, i.e. a virus. In order to do this successfully, E. coli will either not have any EcoRI restriction sites in its own DNA, or it will protect them by methylation so that the enzyme does not destroy the host’s own DNA. I am ignoring the possibility that the DNA we are dealing with in our experiment may not be cleavable with this enzyme.
  2. Also note, that bacterial chromosomes are circular, rather than linear – interestingly, this means that they are not actually ‘chromosomes’ at all. Again, let’s ignore this.
Leave a comment

Posted by on April 18, 2014 in Uncategorized


Tags: , , , , , , , , , , ,

A game for cramming micro students

Exam II in Microbiology happens this Tuesday. If only there was some less stressful way of studying for the exam. Perhaps a puzzle to kick back and contemplate …?

(As an aside, I really don’t like the way this puzzle turned out – for a crossword puzzle, there are very few words that cross. I may attempt to redo this later, but there’s an exam to write first)



Oh No!

Autocorrect strikes again – 1 Across = the DESTRUCTION of all microbial life.

I do apologize for my poor clue-writing. I’m only a recent adopter of crosswords and I’m not yet very good at writing them the way they should appear.

(I can only solve NYTimes’ Monday puzzles)


Leave a comment

Posted by on April 6, 2014 in Uncategorized


Tags: , , , , , , , , ,

Experimental Flaws -Uncontrolled variables

I had an interesting text message from my cousin today. He was asking, ‘What is meant when a  study is deemed to be flawed due to uncontrolled variables? i.e. what does it really Imagemean to have uncontrolled variables?’

It’s an excellent question – and one that is well addressed in a book I recently recommended here called How to Lie With Statistics.

I gave him the following answer:

‘A simple example might be someone looking back through historical data and seeing that the number of cancer cases (of all kinds) has been on the sire over the past twenty years. In terms of absolute numbers, this is true. Some people use this to raise the alarm that we have to get more aggressive in our fight against cancer because it has become a leading killer. Perhaps that’s not a bad idea either, but if someone were to look more closely at the details they would quickly see that these absolute numbers aren’t the right data to make this conclusion by. There are uncontrolled variables.

here’s some real data:

The unaltered or crude cancer death rate per 100,000 US population for the year 1970 is 162.8. Multiply this rate by the US population of that year, 203,302,031 and divide by 100,000, we obtained the total cancer deaths of that year, 330,972. Divide this number by the number of days in a year, we obtain the average number of Americans who died of cancer in 1970 at 907.

Twenty years later, the unaltered cancer death rate for the year 1990 is 505,322, the total population, 248,709,873. The cancer death rate per 100,000 population rose to 203.2. The daily cancer death rate was 1384.

( – original data:The 1970 cancer death rate was taken from p.208 of the Universal Almanac, John W.Wright, Ed., Andrews and McMeel, Kansas City and New York. The estimated 1996 cancer deaths figure was taken fromTable 2 in “Cancer Statistics” by S.L. Parker et al, in CA, Cancer Journal for Clinicians, Vol. 65, pp. 5-27, 1996.The 1970 US population was taken from the World Almanac and Book of Facts, 1993, p. 367; the estimated 1996 population was from the 1997 edition of the World Almanac and Book of Facts, p.382. The 1997 total cancer death figure was obtained from S.H. Landis et al in CA, Cancere Journal for Clinicians, Vol. 48, pp.6-30, 1998, Table 2. The US population for 1997 was obtained from The Official Statistics of the US Census Bureau released on Dec, 24, 1997)
ImageHowever, if this is the limit of the analysis, it’s useless. In 1970 the life expectancy was about 67 years for a white, non-hispanic male, while in 1990 that number was about 74.
Since cancer is a disease of the aged, it is likely that the increase in cancer is directly linked to the increase in population of the elderly.
What this means, it that in order for the study to be meaningful, the authors should look at cancer rates among a more comparable group, perhaps white, non-hispanic non-smoking males living in some certain region  that has not undergone drastic demographic changes or excessive immigration / emigration. By taking these additional steps, we reduce the number of differences in our two populations, allowing us to make a ‘more controlled comparison.’
Leave a comment

Posted by on September 20, 2013 in Uncategorized


Tags: , , , , , , , , ,

Because it was on Dr. Oz, I’m more likely to think it’s a scam

doctor-ozI got something interesting in my inbox the other day. Something that I assume was a  friend’s email address getting hacked – although it’s the least offensive (apparent) hack I’ve ever seen (he says as the viruses circulate around his computer’s RAM).

It was a nearly blank email with a link to a Dr. Oz clip about the weight-loss promoting effects of green coffee extract, which contains high concentrations of chlorogenic acids. These molecules are said to promote weight loss through increasing metabolism.

Being a scientist means being a skeptic. In this case, because I already feel like it must be BS due to its connection with Dr. Oz (an Oprah-elevated proponent of many untested, ‘alternative’ therapies), the challenge for me is to admit the possibility that this stuff may work. So, rather than looking through the data to see if there’s anything to deny the claim, I’m really trying hard to look at the data to see any glimmer  of possibility.

Here’s a link to the Dr. Oz article online. The article was published in the January 2012  Diabetes, Metabolic Syndrome and Obesity, and happily the entire article is available free of charge. So let’s look at the data…

The article examines a “22-week crossover study was conducted to examine the efficacy and safety of a commercial green coffee extract product GCA™ at reducing weight and body mass in 16 overweight adults.” Half of the participants were male and half female – a typical study setup (although I do worry about how data is handled when looking at both sexes together, so let’s pay attention to that.)

Dr. Oz’s website indicates that “The subjects (taking the supplement) lost an average of almost 18 pounds – this was 10% of their overall body weight and 4.4% of their overall body fat.” These are pretty hefty claims, but I could use losing 18lbs, so let’s see where this goes.

The study followed those eight men and eight women for 22 weeks. At the beginning of the study, the average body mass index (BMI) at the start of the study was 28.22 ±  0.91 kg/m2 . Determine your own BMI here.

Note that BMI < 18.5 is underweight

18.5  –  25     healthy weight

25   –   30      overweight

30+               obese

This puts the study participants at the high end of overweight, but ‘preobese’.

Dosages of the green coffee extract and placebo were as follows:

“This study utilized two dosage levels of GCA, as well as a placebo. The high-dose condition was 350 mg of GCA taken orally three times daily. The low-dose condition was 350 mg of GCA taken orally twice daily. The placebo condition consisted of a 350 mg inert capsule of an inactive substance taken orally three times daily.”

I don’t think I’m the first one to point out that it’s hard to have a double blind trial when the dosages are distinguishable (two times vs three times daily). At least the placebo should be indistinguishable from the high dose.

One early eye-catching piece of data is from Table I, that summarizes the data of all precipitants as

BMI (kg/m2) pre study:28.22 ± 0.91        post study:25.25 ± 1.19     change-2.92 ± 0.85**, -10.3%

On average, all subjects lost weight during the study. But this really tells us nothing because we could see a 10% drop in BMI if the test arm lost 20% and then placebo arm stayed the same, or we could see the same thing if the weight loss occurred during ALL arms of the study.

Perhaps this reporting of data is justified by the next statement that participants all rotated through being on high dose, lose dose or placebo with intervening washout periods. Presumably, this makes the most of a small sampling of people, but I do find it harder to be confident about the data. Then again, I have never been involved in any human trial of this kind.

here’s the data:

High Dose arm:

start    BMI (kg/m2) 26.78 ± 1.55  –>    end 26.03 ± 1.36

Low Dose arm:

start    BMI (kg/m2) 26.25 ± 1.37  –>    end 25.66 ± 1.20

placebo arm:

start    BMI (kg/m2) 25.66 ± 1.20  –>  and 26.67 ± 1.72

At first glance this might appear to be pretty good. But let’s graph it out:


the data continue to look great.

Now, with error bars:

ImageHuh. Not so hot anymore.

Also, I’m not how sure this was done, but they get p values for HD p = 0.002, LD p = 0.003, placebo p = 0.384. These stats mean that the HD and LD groups are showing very significant differences, while the placebo group is not. You should be able to see this in the graph with error bars (as an approximation of significance). Unfortunately, I see a whole lot of no nothing. But, perhaps BMI is not the appropriate way to observe weight change (we are, after all not seeing specific weight changes, but changes within a group, i.e. diversity)

Another way to try to see what’s going on is to take a look at the weight data:


The data were presented in a number of other ways, but each of these was confusing and didn’t illustrate any clear conclusion (my interpretation).If the individuals’ data were visualized as a scatter plot, this might show us something – or data for each individuals change while in each group… As it is, we see unclear data with spectacular statistics, but we don’t get to see enough to be convinced of the changes.

Rather than go on and get more and more skeptical, let’s say, although we don’t see a lot here, the data,as reported, would make us want to see a larger study with some revisions for control of diet, exercise monitoring and a change in the way osage is administered so as to maintain the ‘blindness’ of the study.


Posted by on July 22, 2013 in Uncategorized


Tags: , , , , , , , , ,