I had an interesting text message from my cousin today. He was asking, ‘What is meant when a study is deemed to be flawed due to uncontrolled variables? i.e. what does it really mean to have uncontrolled variables?’
It’s an excellent question – and one that is well addressed in a book I recently recommended here called How to Lie With Statistics.
I gave him the following answer:
‘A simple example might be someone looking back through historical data and seeing that the number of cancer cases (of all kinds) has been on the sire over the past twenty years. In terms of absolute numbers, this is true. Some people use this to raise the alarm that we have to get more aggressive in our fight against cancer because it has become a leading killer. Perhaps that’s not a bad idea either, but if someone were to look more closely at the details they would quickly see that these absolute numbers aren’t the right data to make this conclusion by. There are uncontrolled variables.
here’s some real data:
The unaltered or crude cancer death rate per 100,000 US population for the year 1970 is 162.8. Multiply this rate by the US population of that year, 203,302,031 and divide by 100,000, we obtained the total cancer deaths of that year, 330,972. Divide this number by the number of days in a year, we obtain the average number of Americans who died of cancer in 1970 at 907.
Twenty years later, the unaltered cancer death rate for the year 1990 is 505,322, the total population, 248,709,873. The cancer death rate per 100,000 population rose to 203.2. The daily cancer death rate was 1384.
– original data:The 1970 cancer death rate was taken from p.208 of the Universal Almanac
, John W.Wright, Ed., Andrews and McMeel, Kansas City and New York. The estimated 1996 cancer deaths figure was taken fromTable 2 in “Cancer Statistics” by S.L. Parker et al, in CA, Cancer Journal for Clinicians
, Vol. 65, pp. 5-27, 1996.The 1970 US population was taken from the World Almanac and Book of Facts
, 1993, p. 367; the estimated 1996 population was from the 1997 edition of the World Almanac and Book of Facts
, p.382. The 1997 total cancer death figure was obtained from S.H. Landis et al in CA, Cancere Journal for Clinicians, Vol. 48, pp.6-30, 1998, Table 2. The US population for 1997 was obtained from The Official Statistics of the US Census Bureau released on Dec, 24, 1997)
However, if this is the limit of the analysis, it’s useless. In 1970 the life expectancy was about 67 years for a white, non-hispanic male, while in 1990 that number was about 74.
Since cancer is a disease of the aged, it is likely that the increase in cancer is directly linked to the increase in population of the elderly.
What this means, it that in order for the study to be meaningful, the authors should look at cancer rates among a more comparable group, perhaps white, non-hispanic non-smoking males living in some certain region that has not undergone drastic demographic changes or excessive immigration / emigration. By taking these additional steps, we reduce the number of differences in our two populations, allowing us to make a ‘more controlled comparison.’