Scientific Research Is Unreliable, Unreliable Scientists Report
Most peer-reviewed research is crap.
A recent piece in The Economist points to an issue that has been a recurring theme at OTB over the years: there’s reason to be deeply skeptical of most research published in scientific and other scholarly journals. While I’ve long noted the dubious methodology of medical research in particular, and of the vaunted peer review process in general, it’s actually much worse than I’d previously understood.
Over the past few years various researchers have made systematic attempts to replicate some of the more widely cited priming experiments. Many of these replications have failed. In April, for instance, a paper in PLoS ONE, a journal, reported that nine separate experiments had not managed to reproduce the results of a famous study from 1998 purporting to show that thinking about a professor before taking an intelligence test leads to a higher score than imagining a football hooligan.
A few years ago scientists at Amgen, an American drug company, tried to replicate 53 studies that they considered landmarks in the basic science of cancer, often co-operating closely with the original researchers to ensure that their experimental technique matched the one used first time round. According to a piece they wrote last year in Nature, a leading scientific journal, they were able to reproduce the original results in just six. Months earlier Florian Prinz and his colleagues at Bayer HealthCare, a German pharmaceutical giant, reported in Nature Reviews Drug Discovery, a sister journal, that they had successfully reproduced the published results in just a quarter of 67 seminal studies.
The governments of the OECD, a club of mostly rich countries, spent $59 billion on biomedical research in 2012, nearly double the figure in 2000. One of the justifications for this is that basic-science results provided by governments form the basis for private drug-development work. If companies cannot rely on academic research, that reasoning breaks down. When an official at America’s National Institutes of Health (NIH) reckons, despairingly, that researchers would find it hard to reproduce at least three-quarters of all published biomedical findings, the public part of the process seems to have failed.
Academic scientists readily acknowledge that they often get things wrong. But they also hold fast to the idea that these errors get corrected over time as other scientists try to take the work further. Evidence that many more dodgy results are published than are subsequently corrected or withdrawn calls that much-vaunted capacity for self-correction into question. There are errors in a lot more of the scientific papers being published, written about and acted on than anyone would normally suppose, or like to think.
Some of the reasons for this are ones I’ve noted from time to time: the insane pressure to publish, the proliferation of journals that have arisen to accommodate this requirement, and the reliance on small, cherry picked experimental groups in some disciplines, especially medicine. But there’s a fundamental problem that’s actually much bigger.
Various factors contribute to the problem. Statistical mistakes are widespread. The peer reviewers who evaluate papers before journals commit to publishing them are much worse at spotting mistakes than they or others appreciate. Professional pressure, competition and ambition push scientists to publish more quickly than would be wise. A career structure which lays great stress on publishing copious papers exacerbates all these problems. “There is no cost to getting things wrong,” says Brian Nosek, a psychologist at the University of Virginia who has taken an interest in his discipline’s persistent errors. “The cost is not getting them published.”
First, the statistics, which if perhaps off-putting are quite crucial. Scientists divide errors into two classes. A type I error is the mistake of thinking something is true when it is not (also known as a “false positive”). A type II error is thinking something is not true when in fact it is (a “false negative”). When testing a specific hypothesis, scientists run statistical checks to work out how likely it would be for data which seem to support the idea to have come about simply by chance. If the likelihood of such a false-positive conclusion is less than 5%, they deem the evidence that the hypothesis is true “statistically significant”. They are thus accepting that one result in 20 will be falsely positive—but one in 20 seems a satisfactorily low rate.
The next part is incredibly wonky and, even simplified for readers of The Economist, a bit hard to follow.
In 2005 John Ioannidis, an epidemiologist from Stanford University, caused a stir with a paper showing why, as a matter of statistical logic, the idea that only one such paper in 20 gives a false-positive result was hugely optimistic. Instead, he argued, “most published research findings are probably false.” As he told the quadrennial International Congress on Peer Review and Biomedical Publication, held this September in Chicago, the problem has not gone away.
Dr Ioannidis draws his stark conclusion on the basis that the customary approach to statistical significance ignores three things: the “statistical power” of the study (a measure of its ability to avoid type II errors, false negatives in which a real signal is missed in the noise); the unlikeliness of the hypothesis being tested; and the pervasive bias favouring the publication of claims to have found something new.
A statistically powerful study is one able to pick things up even when their effects on the data are small. In general bigger studies—those which run the experiment more times, recruit more patients for the trial, or whatever—are more powerful. A power of 0.8 means that of ten true hypotheses tested, only two will be ruled out because their effects are not picked up in the data; this is widely accepted as powerful enough for most purposes. But this benchmark is not always met, not least because big studies are more expensive. A study in April by Dr Ioannidis and colleagues found that in neuroscience the typical statistical power is a dismal 0.21; writing in Perspectives on Psychological Science, Marjan Bakker of the University of Amsterdam and colleagues reckon that in that field the average power is 0.35.
Unlikeliness is a measure of how surprising the result might be. By and large, scientists want surprising results, and so they test hypotheses that are normally pretty unlikely and often very unlikely. Dr Ioannidis argues that in his field, epidemiology, you might expect one in ten hypotheses to be true. In exploratory disciplines like genomics, which rely on combing through vast troves of data about genes and proteins for interesting relationships, you might expect just one in a thousand to prove correct.
With this in mind, consider 1,000 hypotheses being tested of which just 100 are true (see chart). Studies with a power of 0.8 will find 80 of them, missing 20 because of false negatives. Of the 900 hypotheses that are wrong, 5%—that is, 45 of them—will look right because of type I errors. Add the false positives to the 80 true positives and you have 125 positive results, fully a third of which are specious. If you dropped the statistical power from 0.8 to 0.4, which would seem realistic for many fields, you would still have 45 false positives but only 40 true positives. More than half your positive results would be wrong.The negative results are much more trustworthy; for the case where the power is 0.8 there are 875 negative results of which only 20 are false, giving an accuracy of over 97%. But researchers and the journals in which they publish are not very interested in negative results. They prefer to accentuate the positive, and thus the error-prone. Negative results account for just 10-30% of published scientific literature, depending on the discipline. This bias may be growing. A study of 4,600 papers from across the sciences conducted by Daniele Fanelli of the University of Edinburgh found that the proportion of negative results dropped from 30% to 14% between 1990 and 2007. Lesley Yellowlees, president of Britain’s Royal Society of Chemistry, has published more than 100 papers. She remembers only one that reported a negative result.
The problem, unfortunately, is getting worse rather than better. The use of statistics to make academic research, even in “soft” fields like psychology and political science, more “scientific” has become the norm over the last half century. Unfortunately, most of us in those fields—and for that matter, most chemists, physicists, and physicians—don’t truly understand the increasingly complicated statistics we’re employing. That is, we roughly understand what they’re supposed to do but not the math behind them. And that makes us oblivious to errors.
Statisticians have ways to deal with such problems. But most scientists are not statisticians. Victoria Stodden, a statistician at Columbia, speaks for many in her trade when she says that scientists’ grasp of statistics has not kept pace with the development of complex mathematical techniques for crunching data. Some scientists use inappropriate techniques because those are the ones they feel comfortable with; others latch on to new ones without understanding their subtleties. Some just rely on the methods built into their software, even if they don’t understand them.
Things have come a long way since I was in graduate school twenty years ago. At that point, doing complex statistical analysis was a much more labor-intensive exercise. At Alabama, at least, it still involved having data tapes loaded onto a mainframe and then writing one’s own code. Programs to do all this on one’s PC were about to hit the market and, of course, the Internet explosion put the databases at every scholar’s fingertips. But, while an obvious boon to those who are adept at statistical analysis, this movement both made it easier—and thus even more mandatory—for everyone to use complex statistics but simultaneously took scholars further away from understanding the steps they were taking. The machine does it all for you now, but most don’t understand what the “all” entails.
As it was, I spent the better part of two semesters working on a statistical paper on international conflict before coming to the point where I had zero confidence at all that the results were valid. I’d had several semesters of research methods coursework and gotten A’s in every class but realized that there were dozens of places where I could have screwed up the code and that my intuitive understanding of the math was so thin that I’d have no way of seeing the errors. Further, my academic advisor on the project, John Oneal—who actually had the requisite math skills—was unlikely to ever spot the errors given that there were hundreds of lines of code that he would never look at. Nor, incidentally, would any peer reviewer.
It’s quite possible that I could have turned all this into a dissertation and published numerous journal articles out of this effort had I kept at it. Professionally, I’d certainly have been better off than I was with the history and policy oriented dissertation I ultimately produced under Don Snow’s direction. But I simply had no confidence that I knew what I was doing.
This fits with another line of evidence suggesting that a lot of scientific research is poorly thought through, or executed, or both. The peer-reviewers at a journal like Natureprovide editors with opinions on a paper’s novelty and significance as well as its shortcomings. But some new journals—PLoS One, published by the not-for-profit Public Library of Science, was the pioneer—make a point of being less picky. These “minimal-threshold” journals, which are online-only, seek to publish as much science as possible, rather than to pick out the best. They thus ask their peer reviewers only if a paper is methodologically sound. Remarkably, almost half the submissions toPLoS One are rejected for failing to clear that seemingly low bar.
The pitfalls Dr Stodden points to get deeper as research increasingly involves sifting through untold quantities of data. Take subatomic physics, where data are churned out by the petabyte. It uses notoriously exacting methodological standards, setting an acceptable false-positive rate of one in 3.5m (known as the five-sigma standard). But maximising a single figure of merit, such as statistical significance, is never enough: witness the “pentaquark” saga. Quarks are normally seen only two or three at a time, but in the mid-2000s various labs found evidence of bizarre five-quark composites. The analyses met the five-sigma test. But the data were not “blinded” properly; the analysts knew a lot about where the numbers were coming from. When an experiment is not blinded, the chances that the experimenters will see what they “should” see rise. This is why people analysing clinical-trials data should be blinded to whether data come from the “study group” or the control group. When looked for with proper blinding, the previously ubiquitous pentaquarks disappeared.
Other data-heavy disciplines face similar challenges. Models which can be “tuned” in many different ways give researchers more scope to perceive a pattern where none exists. According to some estimates, three-quarters of published scientific papers in the field of machine learning are bunk because of this “overfitting”, says Sandy Pentland, a computer scientist at the Massachusetts Institute of Technology.
Similar problems undid a 2010 study published in Science, a prestigious American journal (and reported in this newspaper). The paper seemed to uncover genetic variants strongly associated with longevity. Other geneticists immediately noticed that the samples taken from centenarians on which the results rested had been treated in different ways from those from a younger control group. The paper was retracted a year later, after its authors admitted to “technical errors” and “an inadequate quality-control protocol”.
The number of retractions has grown tenfold over the past decade. But they still make up no more than 0.2% of the 1.4m papers published annually in scholarly journals. Papers with fundamental flaws often live on. Some may develop a bad reputation among those in the know, who will warn colleagues. But to outsiders they will appear part of the scientific canon.
Part of the problem here is that we’ve reversed the order of operations here, totally undermining a core theory of research design, namely that analysis should follow theory rather than the reverse. Even two decades ago, it was simply gospel that good scientific research started with a hypothesis grounded in theory and that modeling and statistically analysis followed. But, increasingly, researchers are mining the data for interesting results and then crafting a theory to explain the outcomes. The reason for this has already been alluded to: academics need to publish and one’s findings need to be interesting to get published. But the value of the older way of doing things was that it led to extreme skepticism about interesting but totally counterintuitive results.
It gets worse:
John Bohannon, a biologist at Harvard, recently submitted a pseudonymous paper on the effects of a chemical derived from lichen on cancer cells to 304 journals describing themselves as using peer review. An unusual move; but it was an unusual paper, concocted wholesale and stuffed with clangers in study design, analysis and interpretation of results. Receiving this dog’s dinner from a fictitious researcher at a made up university, 157 of the journals accepted it for publication.
Dr Bohannon’s sting was directed at the lower tier of academic journals. But in a classic 1998 study Fiona Godlee, editor of the prestigious British Medical Journal, sent an article containing eight deliberate mistakes in study design, analysis and interpretation to more than 200 of the BMJ‘s regular reviewers. Not one picked out all the mistakes. On average, they reported fewer than two; some did not spot any.
Another experiment at the BMJ showed that reviewers did no better when more clearly instructed on the problems they might encounter. They also seem to get worse with experience. Charles McCulloch and Michael Callaham, of the University of California, San Francisco, looked at how 1,500 referees were rated by editors at leading journals over a 14-year period and found that 92% showed a slow but steady drop in their scores.
As well as not spotting things they ought to spot, there is a lot that peer reviewers do not even try to check. They do not typically re-analyse the data presented from scratch, contenting themselves with a sense that the authors’ analysis is properly conceived. And they cannot be expected to spot deliberate falsifications if they are carried out with a modicum of subtlety.
Fraud is very likely second to incompetence in generating erroneous results, though it is hard to tell for certain. Dr Fanelli has looked at 21 different surveys of academics (mostly in the biomedical sciences but also in civil engineering, chemistry and economics) carried out between 1987 and 2008. Only 2% of respondents admitted falsifying or fabricating data, but 28% of respondents claimed to know of colleagues who engaged in questionable research practices.
Note that most of these errors are occurring in journals devoted to medicine and the “hard” sciences. As problematic as bad findings in political science are, we’re not making life-altering decisions based on the latest findings in the Journal of Politics.
While there are ways to fix some of this—I’ll focus on cross-field collaboration shortly—human nature dooms us here.
[R]eplication is hard and thankless. Journals, thirsty for novelty, show little interest in it; though minimum-threshold journals could change this, they have yet to do so in a big way. Most academic researchers would rather spend time on work that is more likely to enhance their careers. This is especially true of junior researchers, who are aware that overzealous replication can be seen as an implicit challenge to authority. Often, only people with an axe to grind pursue replications with vigour—a state of affairs which makes people wary of having their work replicated.
There are ways, too, to make replication difficult. Reproducing research done by others often requires access to their original methods and data. A study published last month inPeerJ by Melissa Haendel, of the Oregon Health and Science University, and colleagues found that more than half of 238 biomedical papers published in 84 journals failed to identify all the resources (such as chemical reagents) necessary to reproduce the results. On data, Christine Laine, the editor of the Annals of Internal Medicine, told the peer-review congress in Chicago that five years ago about 60% of researchers said they would share their raw data if asked; now just 45% do. Journals’ growing insistence that at least some raw data be made available seems to count for little: a recent review by Dr Ioannidis which showed that only 143 of 351 randomly selected papers published in the world’s 50 leading journals and covered by some data-sharing policy actually complied.
It’s difficult enough to get people to spend hours doing peer review for journals in their field. There’s zero financial and precious little professional reward for doing so. Certainly, they’re not going to devote weeks to painstakingly replicating the work. Beyond that, the increasing specialization makes it next to impossible, anyway:
Software can also be a problem for would-be replicators. Some code used to analyse data or run models may be the result of years of work and thus precious intellectual property that gives its possessors an edge in future research. Although most scientists agree in principle that data should be openly available, there is genuine disagreement on software. Journals which insist on data-sharing tend not to do the same for programs.
Harry Collins, a sociologist of science at Cardiff University, makes a more subtle point that cuts to the heart of what a replication can be. Even when the part of the paper devoted to describing the methods used is up to snuff (and often it is not), performing an experiment always entails what sociologists call “tacit knowledge”—craft skills and extemporisations that their possessors take for granted but can pass on only through example. Thus if a replication fails, it could be because the repeaters didn’t quite get these je-ne-sais-quoi bits of the protocol right.
Taken to extremes, this leads to what Dr Collins calls “the experimenter’s regress”—you can say an experiment has truly been replicated only if the replication gets the same result as the original, a conclusion which makes replication pointless. Avoiding this, and agreeing that a replication counts as “the same procedure” even when it gets a different result, requires recognising the role of tacit knowledge and judgment in experiments. Scientists are not comfortable discussing such things at the best of times; in adversarial contexts it gets yet more vexed.
As alluded to earlier, one fix that would help address the glaring issue of researchers who are over their head in the math is cross-collaboration with those who aren’t. That may be unrealistic, in that there may be little professional incentive for a professional statistician or computer scientist to serve as a secondary author on work totally outside their field. But it’s an obvious solution. Specialization and division of labor have been the norm in other fields for well over a century; it makes no sense for physicians or political scientists to try to masquerade as mathematicians.
Hat tip: Charli Carpenter
James: “As alluded to earlier, one fix that would help address the glaring issue of researchers who are over their head in the math is cross-collaboration with those who aren’t. That may be unrealistic, in that there may be little professional incentive for a professional statistician or computer scientist to serve as a secondary author on work totally outside their field. But it’s an obvious solution. Specialization and division of labor have been the norm in other fields for well over a century; it makes no sense for physicians or political scientists to try to masquerade as mathematicians.”
If you look at the CV’s of statisticians in academia, there will be very large numbers of papers based on interactions with researchers in other field. Some are analyses using standard methods; others will be based on developing new methods.
I think that the big problems are what were outlined in the article:
1). Most scientists know little about statistical methods at the professional level.
2). Small power leads to the true vs. false positive ratio shrinking.
3). Specialization and complex work helps plain old experimental errors go undetected.
4). Vast quantities of studies and a bias for the novel means that false positives flourish.
Some say Alien Base:
What is this triangular shaped object on the moon?
http://commoncts.blogspot.com/2014/01/wow-what-is-this-triangular-shaped.html
seems ironic as it wasn’t long go that scientists relied on the tedious collection of data by pencil/paper and a band of science majors. the time required to analyze data was beyond the realm. fast forward to now and we “should” have up to the minute data collection that’s analyzed immediately. i guess it all comes down to how it’s interpreted?
I wish I could say I’m surprised, but I’m not. As an astronomer, I frequently am shocked by the statistical ignorance of some, particularly in the bio-sciences. A friend I know who does (and understands) Bayesian analysis frequently rants about this. That they’ll look at 400 genetic markers, find 20 have a “95% confidence level” of correlation and think they’ve found something.
I think the problem is that journals really need someone in-house who knows stats and can vet any paper. Unfortunately, they don’t have the money for it.
I think a related problem is that, in the sciences, we’ve found most of the low-hanging fruit. Further discoveries are being made by scraping large databases for results. This means an increasing reliance on stat methods that many scientists don’t understand.
All is not lost, of course. The good thing about science is that there is a corrective mechanism and bad results will, in time, be revealed. But not before even more bad science comes out.
My real concern, however, is that this problem results in people dismissing good science or using it as an excuse to ignore science they don’t like. The classic example is global warming. The deniers like to jump on the statistical goof in the original hockey stick (conveniently ignoring that using the right methods gets the same result). You know this sort of thing is going to be dragged out by AGW deniers, intelligent design supporters, etc. to “prove” that all science is bunk, even while they continue to use their smart phones.
That tedious collection meant the researcher had intimate knowledge of the data as it was recorded and devoted considerable effort to ensuring its validity. The ease of collection now means that quantity is expected to diffuse lack of quality. That is a false belief.
Manufacturing went through this, hoping to fix quality at the end of the assembly line instead of devoting the time necessary on the front end. We got the US auto industry of the 1970s as a result. Six-sigma comes from 7P (prior proper planning prevents piss poor performance)
As an academic statistician, let me say that building collaborations with scientists is highly valued in Statistics, not just for the specific publication (though that never hurts) but for the opportunity collaboration often provides down the line to develop statistical methods that have an impact on real problems while giving insight into more general statistical questions. What is learned from such collaborations leads to new statistical usually ideas and thus tends to advance both fields.
Different institutional and disciplinary cultures can weaken or strengthen this incentive, however. And building good collaborations has a time and productivity cost in the short term. So it is worth emphasizing the joint benefit of inter-disciplinary collaboration and strengthening those incentives where possible.
In my own field, engineering, I find that less than 20% of practitioners are really any good at it. Most are hacks qualified to do routine work, nothing difficult, nothing innovative. I suspect the same is true of most fields, even fields requiring PhDs, Economics for instance. Most people don’t realize how mediocre their peers are, the Dunning-Kruger effect also applies to perceptions of peers. But in those fields, the 80+% still have to publish. And review.
_____
Per WIKI – “The Dunning–Kruger effect is a cognitive bias in which unskilled individuals suffer from illusory superiority, mistakenly rating their ability much higher than is accurate. This bias is attributed to a metacognitive inability of the unskilled to recognize their ineptitude.[1] Actual competence may weaken self-confidence, as competent individuals may falsely assume that others have an equivalent understanding.”
As I’ve mentioned before from time to time back when I was in grad school I tutored social science graduate students on probability and statistics. I was appalled at how meager their understanding was and, worse, how uninterested they were in the subject. They just wanted enough to get by. They had no intuition or feeling for numbers.
That was a long time ago and the grad students I was tutoring are now senior faculty and heads of departments. If those I encountered were in any way typical and I suspect they were, it goes a long way to explaining the terrible conditions of scholarship.
Note, too, that physicians typically take just enough math to get into med school and not a bit more and lawyers take none at all if they can possibly help it. Congress, overwhelmingly dominated by lawyers, is just chock-full of guys whose last math class was in junior high and that was during the Eisenhower Administration.
One large contribution to the problem is regression to the mean. You get an initial result that seems strong and differentiated from your controls and thus repeat it for confirmation. Second repeats often do not show results as strong as the first. Repeat the experiment again and maybe the results still aren’t as strong as the first but not as weak as the second. Average the results and you’ve still got numbers that exceed 95% or 98% confidence and you go ahead and publish. Subsequent repetition by other labs produce the regression to the mean and the effects disappear. For these and other reasons we seldom take a single paper at face value to guide our research without trying to repeat the work first. Unfortunately, negative results tend not to get published. Instead they are passed as part of an ‘oral history’ among researchers in the field when people interact at meetings and or through their social networks.
This is true of all non-fiction publishing, though, isn’t it? Get results out fast fast fast, never mind the accuracy – and any step that slows the process down is increasingly marginalized.
Interesting how academia is so reflective of the outside world – perhaps the “ivory tower” is more like a bungalow.
Let’s hope that science doesn’t become as accuracy-immune as our political columnists.
It depends on the field. It’s virtually impossible to publish a “crap” paper in a good mathematics journal because the referees can follow every step in the proof of a theorem. The same is true in theoretical physics and chemistry articles if a calculation is made in paper and pencil form, like the great majority of studies carried out before the computer age. It gets harder if the the paper reports the result of a numerical calculation, but even then an alert reviewer can usually detect fraud and inadvertent errors. Where it gets difficult is in reviewing purely experimental papers. It’s hard to find out if an author is fudging his data or misinterpreting it unless the reviewer repeats the experiment himself or is unusually well informed about the type of experiment. But even then the career consequences of a fraudulent study are so catastrophic if the fraud is detected that few scientists, I think, would try to publish fraudulent data. Faking your results and getting caught is professional death. Every graduate student in the sciences knows this. So I take the premise of this post with many grains of salt.
Let’s not get too nostalgic about the pen-and-paper era. We realize the problems with computer-aided research because they’re the same problems we confront in all our dealings with computers. Something in the wrong column, obsolete code, corrupted data. But those paper spreadsheets weren’t any better. They’re now yellowed, or missing. The calculations done by hand could have been just as bad as the modern, high-tech errors. And both old and new methods rely on the quality of the input data.
My approach is to believe nothing until at least 5 years after I first hear about it. And if the research tells me something I don’t want to hear, it’s 10 years.
OK, let me be the one to state the obvious. There are lies, damn lies, and statistics.
Thus ends my contribution to this thread.
@michael reynolds: That’s sound advice
@Hal_10000:
We’re in such an age that you don’t require that many bad eggs to poison a whole basket. This is the real danger.
The horrible part is how our current budget woes are draining grants for scientific research, so the institutional pressures presented here are only going to get worse.
Yes, the number of experimental papers that don’t have error bars on their data appalls me… That neat little bump in the middle of your data means absolutely zilch if your error bars run the length of the page.
And fuggetabbat other scientists being able to decipher computer code. Quite often, the scientist who wrote it doesn’t know what it does…especially several years after he wrote it.
The big problem here is thst people expect scientific research to produce “Life altering results”, a phrase used by Dr. Joyner in his post. That’s actually asking too much. Science can help build an airplane or show how to make bacterially expressed insulin, but it can’t support a massive public established religion that tells everyone what to eat, what not to smoke, and with how much water to flush the toilet. This problem has existed foe a long time—back in the early 1980s, my advisor commented to me that most of the biomedical literature was clinical, and most of that was garbage.
The core of people doing hard science, which includes the writer of this comment, are well aware of the problems discussed in this article and deplore them regularly in casual conversation. However, even those of us who have tenure in our academic positions don’t have tenure to do research—continued productivity is needed, and it is hard to do that while brawling publicly with armies of idiots with political axes to grind and constituencies providing them with resources.
What is extremely dangerous about current times is that the abuse of science has risen to a point where eventually the whole enterprise may lose support because of massive public policy abuses built on scienticism (science treated as religion). The only solution is increased knowledge and understanding of real science among the masses of the population. We need more of the spirit of Richard Feynman and his skepticism during the Challenger hearings and less of James Hansen calling for people’s trials of AGW skeptics in the spirit of Lysenkoism under Staiin.
BTW, Michael Reynolds’ comment is exactly right…but make it 10-15 years.
@michael reynolds: There’s a very old saying that agrees with you: “Hear everything, trust nothing.”
In regards to biomedical science, I agree that part of the problem is that biomedical scientists (like myself) lack a sophisticated understanding of statistics, all too frequently engaging in mindless quests for P values less than the magical 0.05 value.
Far more importantly, the system in which academic biomedical scientists operate–including “soft” money, very tight NIH budgets, cut-throat competition to obtain grant funding, nonsensical methods of evaluating productivity by grant reviewers and promotion committees, publish or perish, and overemphasis on which particular journals findings are published in–seems almost like it was designed to incentivize cutting corners and to encourage cheating.
I’m glad that this this dirty little secret–that too much of what is published in my field is worse than useless–is coming to light. Biomedical scientists have been unable to reform the system from within. My hope is that the people paying for this research (you, fellow taxpayers, acting through the federal government) and other interested parties (pharma) will raise enough of a stink that reforms will be forced upon the NIH-funded biomedical research community.
Let me draw a sharp distinction here between fields that are being inept, and fields that are being willfully misleading. Epidemiology, I’m looking at you.
The problem of multiple hypotheses has grown enormously in recent years with the advent of enormous data sets (“Big Data”) that can be mined for information. Classical statistical methods, such as hypothesis testing with a 95% confidence threshold, are only valid what that is the only hypothesis you are testing, and you specified the hypothesis before you looked at the data. What happens today is that any hack with a large database and a copy of R or STATA can test literally millions of different hypotheses with a few lines of code. Of those, many will show ‘significant’ effects, just by random chance. If you then publish those without mentioning that you tested millions of hypotheses, your findings are indistinguishable from science.
Real statisticians compensate for this effect by raising the significance bar as a function of how many hypotheses are being tested. The oldest version of this is something called a “Bonferroni adjustment”, which simply divides the significance level by the number of hypotheses. Test one, report at 95%. Test two, report at 97.5%. Test ten, report at 99.5%. Test a million, and any one of them would need to be significant at the 99.99995% confidence level to be a “reportable fact”. (Bonferroni is a bad approximation when you have that many hypotheses, but more accurate methods exist.)
In the epidemiology literature, you will now not only find lots of people failing to adjust, but a core of senior researchers publicly defending this failure to adjust, on the grounds that it’s less damaging to report lots of false positives than to miss a real effect. This is, of course, crap — especially given the fact that those irreproducible findings never quite go away. No amount of good science will now convince everyone that vaccinations don’t cause autism. No amount of good science will now convince everyone that cell phones don’t cause brain cancer. No amount of good science will now convince everyone that eggs are good for you.
@DrDaveT: I think with diet, there’s too many conflicting and overlapping factors to be able to pull out anything. Aside from telling people not to nosh on plutonium, probably every single dietary maxim out there has (or will be) directly opposed by perceived wisdom within ten years.
@grumpy realist: Actually, Michael Pollin points to some convincing research in his various books, to the effect that any pre-industrial diet is good for you, and any industrial diet is bad for you. But I take your point.
You raise the other big issue, which is confusing correlation and causation. Extracting causal conclusions from empirical (as opposed to designed) studies is fraught with peril, and should be left to people whose expertise is in the statistics, not the science.
Since I believe everything I see in the movies I thought it would take 200 years!
http://www.youtube.com/watch?v=1yCeFmn_e2c
Pre-industrial food production was great for the farmers too!
http://brookfordfarm.com/wp-content/uploads/2011/04/OYHS-Plowing-650×487.jpg
You cant be good at everything. You can hire statisticians to run the numbers and make sure things are correct, but that eats up research budgets. If we want valid results, we need to demand that it be done correctly and if that includes hiring someone to do the stats, then it will just have to mean less research being published. Which is fine as I am always behind in my reading. Which is also ok since most of it is wrong.
Steve
@ernieyball:
Just to be clear, it’s the food that Pollan says needs to be pre-industrial, not the methods used to grow it. Well, with the possible exception of meats — it matters what the animals were fed, too, so the cows and pigs and chickens need to have a pre-industrial diet as well.
A couple of us (or our evil twins) have dropped reference to the underlying paper here at OTB in the past.
In fact, the “sorry state” article mentioned as related above uses the same research as its starting point.
The reiteration is fine, I suppose. I’d worry though that many will misunderstand the difference between new work, and more established understanding. There can be a great deal of “froth” in new work without it necessarily undermining the core knowledge in a discipline.
Thus, I worry that this article could be contributing to …
The Death Of Expertise
@DrDaveT:
I think Pollan’s “Eat food. Not too much. Mostly plants” is pretty conservative advice, and unlikely to be judged wrong anytime soon … or later.
(“Food” in that is a reminder that the closer you are to something that lived, and without arcane processes inflected on it, the better you are. Cooking is not arcane, and to be encouraged.)
Why would anyone want to use modern methods to produce pre-industrial food?
I’m going with these guys.
If we look at this food timeline http://www.foodtimeline.org modern growing methods can be used to produce everything that existed before 1750 because those foods are “pre-industrial” and anything developed later is not to be produced because it is “industrial”?
@ernieyball:
It is short-hand of course. Vegetables are frozen on an industrial scale, which makes healthy veg available to many at low cost. On the other hand, industry produces tubs of margarine, bottles of soda, and a great variety of “chips” of questionable utility. The more “food chemistry” involved, the further you are from “food.”
(As a general, and I think reliable, rule “supplements” are not really “food” and are thus to be avoided.)
@ernieyball:
Because that way you get more food, with less work, and it tastes good and is better for you?
I think you’re misunderstanding what I mean by “pre-industrial food”. I’ll be nice and assume that’s my fault so far.
Wheat is a pre-industrial food. That includes modern high-yield dwarf hybrids.
Corn is a pre-industrial food. I’ll even include genetically modified corn in that.
Beef is a pre-industrial food, if you get it from a cow that lived on grass.
Bread is a pre-industrial food, if you make it the old-fashioned way from scratch.
Bacon and eggs is a pre-industrial food, depending on what the pigs and chickens ate.
Etc.
Industrial foods are things that replace the traditional ingredients your great-grandma would have been familiar with (and which tend to rot or mold or go rancid pretty quickly) with processed or synthesized substitutes. If you want a quick primer on what those are, I recommend Steve Ettlinger’s book Twinkie, Deconstructed as a start. (He apparently also has a website here.) Michael Pollan’s The Omnivore’s Dilemma is a less neutral treatment of the same topic.
Just looked in my fridge. One 1 ltr bottle of cheap soda. Half empty. Been there for months.
No tubs. Love the store brand “Natural” peanut butter. Two ingredients. Peanuts, salt. (Right below the ingredients is the disclaimer in bold CONTAINS PEANUTS.)
Got a $1 off coupon for 2 bags of chips on the counter. Kettle Cooked is my game!
I think I get all the vitamins I need from the food I eat.
—–
If it wasn’t for chemistry and chemical reactions there would be no food. Seeds would not grow.
Everything is made of chemicals.
(except for dark matter?)
@DrDaveT: NBA all day today in honor of Dr. Martin Luther King Jr.
I am off to the local Buffalo Wild Wings to catch the Bulls hopefully knock the Lakers out of their wheelchairs
while I glom down some post-industrial chicken!
Smell ya later!
I note an upsurge in “everything is chemicals” after the WV spill.
Perhaps to imply that coal sovents are fine.
After all, water is a chemical.
@john personna:
You beat me to it.
@john personna:
I think it followed the upsurge in people saying not to eat certain foods because they have chemicals in them. That doesn’t mean what the speakers intend and results in people poorly making a reasonable point look foolish. The speakers in most cases mean synthetically produced foods or highly refined foods or foods with synthetically produced preservatives, although sometimes they mean GMO foods.
In general it amounts to people that know a little dismissing people that know less without engaging the intended argument.
More presicely a chemical compound composed of two chemical elements…also called a universal solvent.
So how is the pre-industrial diet meat slaughtered?
————————
Chicken that has been hit in the head with a post, cooked and eaten.
@ernieyball:
Um, by killing it and cutting it up?
I was about to say “the same as any other meat”, but then I remembered how current mass production slaughterhouses in the US work. So I’ll amend that to “in some way that cleanly separates the meat from the feces”.
I can see I chose my adjectives unwisely up above. All I’m really talking about here is everything that isn’t processed food, or fattened on foods the animal wouldn’t eat in the wild. And it’s not my personal diet; it’s the one that seems to keep people healthy.
@Come back and re-read that last paragraph in ten years.Hal_10000:
“Scientists” produce results that they are paid to produce. Like most professions, they strive to please their customers so they, too, can receive a paycheck and feed their families.
@grumpy realist: http://www.phdcomics.com/comics.php?f=1476
I’m particularly fond of “I put the numbers into this magic box and out came my thesis!”
What is the accepted pre-industrial method of killing livestock?
@teapartydoc:
Actually, you’ve had 10 years already, more than once.
(1979) US National Academy of Sciences report finds it highly credible that doubling CO2 will bring 1.5-4.5°C global warming.
@ernieyball:
OK, I get it. My bad for thinking you were actually asking serious questions.
(Though I suppose it’s possible you just have a thing about slaughter, in which case you might want to seek professional help.)
No. You are the one advocating for the pre-industrial diet because “that way you get more food, with less work, and it tastes good and is better for you?”
I have in the past killed,(slaughtered) cleaned and cooked (processed) and eaten animals I caught in the wild. I lived on my cousins farm several summers so I know the taste of fresh killed chicken and pork.
Today I opt to pay someone else to do as much of that as I can afford.
Pardon me for asking questions about pre-industrial food processing. I just wanted to see if is similar to what I have experienced.
If I want advice about my state of mind I will be consulting with others. Not you.
@ernieyball:
Right. And so…?
Me too. What on earth does this have to do with what you quoted above?
OK, I see the disconnect now.
No, you don’t pay someone to do what you remember. What they do to produce that hunk of meat at the local grocers bears very little resemblance to what you saw on the family farm. The animal was not raised on a family farm, was fattened much faster, and was slaughtered in a big machine that reminds me of the old Monty Python “architect sketch” — conveyor belts, rotating knives, etc. I have no problem with automation, except this particular process isn’t very picky about keeping the inside and the outside of the animal separate.
Fair enough. Yes, what you experienced is “pre-industrial” as I’ve been using the term.
Apologies for the tone of the previous comment; I couldn’t imagine why you were focusing on the slaughter, when I was trying to talk about the food content. The only intersection (that I know of) is in the likelihood of getting E. coli or salmonella into the product.
@DrDaveT:
It is amazing the lengths to which this is taken. Even out here in the middle of the Pacific the cattle ranches only raise the beef cows to yearlings, then they are put on a boat to travel over 2k miles to Long Beach, then taken by train to industrial farms in the mid west to be fattened for slaughter. We have the largest privately owned cattle ranch in the US and only one slaughter house left on the island.
I pay some one to kill, process, transport, cook and on a good day deliver it to my table with a smile!
I am under no illusions that It resembles “what I remember”.
In fact the further away I can get from the stink of the hog pen and the chicken coop the better.
So how would you do it?
@ernieyball:
They aren’t my animals and it’s not my land, so it isn’t my decision to make, but if I had my druthers more of the ranches would raise them to slaughter here. When they did, we had cheaper and better beef here. Now it is almost all shipped from the mainland. The consumer here doesn’t see savings and gets an inferior product. The largely absent ‘ranchers’ (mainly in CA and Japan) are mainly keeping cattle on the land for tax purposes until it is economically feasible to further develop the land for housing or hotels. They make a bit more money for their hassle sending the cows off as yearlings because they don’t have to keep up with the animals after the first year. That doesn’t help me as a consumer or as someone that cares about the way food animals are treated. All in all it means I eat less beef, which is probably better for me.
I am a childbirth educator and the “Science” used to defend current birth protocols are so warped by drug company money and influence at times it feels as though the whole Obstetric profession is nothing more than some giant, fat, bulbous emperor waddling around the corridors of the local hospital naked.
Want to talk about real world effects of bad science?
The US maternal mortality rate is going up!
Only a few concerned whistleblowers have bothered to notice and speak up, and we have been subjected to long standing campaigns of hate and derision.
We have to divorce science from government and stop using taxpayer money from funding this insanity.
Jenny Hatch
http://WWW.JennyHatch.com
@Jenny Hatch:
I was with you right up to that point. I think you need to re-think.
There are really only 3 possible sources of the kind of research that requires deep pockets: government, industry, and academia. This thread started out with the pitfalls of the current academic model, and you are well-aware of the pitfalls of having all of the research done by for-profit companies. That leaves government.
I’ve worked at a government lab, and in academia, and in private industry, and in the not-for-profit sector. The quality of the research at the government lab was far and away the best. Only wingnuts believe that “The Government” is some kind of monolithic entity with a unified agenda.
I think Jenny might be saying that in a “follow the money” sense, government funds are used to create “revenue streams,” and that in a classic agency problem the beneficiaries of those streams shape the agenda.
I agree with DrDave that cutting the funding isn’t the answer.
Fight the “agents” of those agenda, be they pharmaceutical companies seeking to justify a questionable drug, or university administration trying to expand their patent portfolio.
As a taxpayer, I really hate to “pay twice” for a tax funded invention.
@Jenny Hatch:
I have a 2 month old baby girl and that was not at all my experience. Kaiser gave us a private room with a place for me to sleep so I could stay comfortably with my wife through the entire process. They fully supported our decision to have the delivery performed by a nurse/midwife (they have at least one midwife and one OB on the ward 24/7). They supported my wife’s decision to go without drugs and provided a large pool (6′ dia x 2′ deep) in the room for only $42. The pool of warm water allowed her to make it through transition drugless. About 20 hrs after her water broke, her contractions slowed from 1/4 min to 1/8 min. At that point the midwife explained to my exhausted wife and myself that we could continue on drugless and we would probably deliver in 8-12 hrs or if we wanted she could have a petosin drip to speed things up. Wife opted for petosin drip and Sophie came in 2hrs. As soon as she was wiped down Sophie was placed on my wife’s chest and was nursing within 15 min. We were moved to a ‘mommy and me’ room where she and I and baby stayed until her release. Kaiser encouraged skin to skin from the beginning and encouraged and supports breast feeding. All of those decision were based on research that they provided on request. All in all, I don’t think our experience could have gone better. Kaiser is rather progressive, particularly on the West coast and Hawaii, so ymmv. Is it so different back East?