Fraud and questionable research practices - all that different?
Many scientific researchers have been found guilty of committing fraud in past decades. Diederik Stapel, a hotshot Professor from Tilburg University, was found to have fabricated dozens of studies in the field of behavioral psychology. Another was the dermatologist William Summerlin, who initially claimed to have solved the problem whereby a patients immune system rejects donor skin following skin-graft surgery, by demonstrating that skin from black mice get accepted by the immune system of white mice if the skin is first marinated in very select nutrients. However, he was later found to have simply used the skin of a white mouse and coloured it in with a black marker pen (I’m not kidding).
The biologist Woo-Suk Hwang was also found guilty of committing fraud, after whistleblowers from his lab shot down his publication that argued they had successfully cloned human embryos - in actuality, nothing of the sort had been achieved, in his article he just tried to pass off two cell images from a single patient, as two cells images from separate patients. Very recently, you may also have heard about another potential fraud case. Professor Francesca Gino, a rising star at Harvard University, is currently under investigation for fraud after a team of scientists found evidence indicating she manipulated data to ensure it supported her theory. Ironically, she is known for her research on ‘honesty’. Here is a link to the New York times article on this, and to a relevant thread on Twitter.
When Fraud cases like these become well-known, they can hit hard - they can make national and international headlines, and generally result in the researcher losing their job and and being unable to gain similar employment. These outcomes are fair and to be expected - often the researchers are trusted and revered luminaries in their field. That they had been fudging the data, while lying to collaborators, colleagues, and the public, sometimes for years, rightly comes as a shock. This reaction is also expected given it further erodes our trust in science, and given tax-payers funded this wasted research to the tune of millions of dollars.
Questionable Research Practices
However, you may be surprised to learn that many many more researchers engage in similar practices, particularly in psychology, that warp understanding of their findings, but who experience no consequences, or if there are consequences they seem - relative to these more official ‘fraud’ cases - so minor that the term ‘consequence’ seems like hyperbole. For instance, in one study around 50% of the psychology researchers surveyed stated that in their own scientific research they had presented an interesting finding as something they had expected prior to the data coming in, even though they didn’t expect anything of the sort. In support of this figure, 92% of the psychology researchers sampled in another study stated that they knew of other researchers that had engaged in this practice.
A further study found 23% of the psychology researchers they sampled had in their own research stopped collecting data earlier than planned because they found the effect they were looking for. Nearly 60% of the researchers from this same study acknowledged that in their previous research they had decided whether to collect more data for their research only after first checking whether their proposed effects were statistically significant (i.e., statistics able to indicate to us whether relationships in their data may exist in the population of interest).
On the face of it these research practices may not seem problematic, and certainly not as bad as the prominent fraud cases where key data was actively made up or falsified. However, the stated practices - which are generally referred to as questionable research practices - largely prevent the statistics used in the published article from telling us much at all about what the investigated effect is likely to look like in the population of interest - which is generally what the research sets out to achieve, and is the basis for why the research was considered valuable and thus worthy of tax-payer investment. So if there are no questionable research practices and our investigated effects are statistically significant - which often reflects a significance (or P) value of less than .05 - then it can tell us that the differences or relationships identified in a sample would be unlikely if there was no effect in the population (and offers tentative support to the existence of effects in the population).
But if there are questionable research practices, then statistical significance will appear much more frequently even if there is no effect at the population level. A now very popular study achieved their aim to make this reality particularly clear - after the researchers purposely took advantage of various questionable research practices (e.g., after finding nonsignificant effects, they added in control variables to the analyses, excluded participants, etc.), they were able to demonstrate (via a now significant p value - hooray!) that participants could become younger by listening to ‘when I’m sixty-four’ by The Beatles relative to listening to ‘Kalimba’ by the Wiggles.
In short, these practices, just like the instances of fraud, ultimately guide us towards the idea that effects exist (or exist at that particular size, or in a particular direction) in the overarching population, when there is no actual evidence at all demonstrating the stated effects. In sum, questionable research practices seriously warp our understanding of what effects exist in the population, and how big or small any effects are likely to be. Considering the huge amount of time and money that goes into such research, its evident that questionable research practices are a major problem for psychology and science in general. See below for a good account by Chin and colleagues and O’Boyle & Götz on the impact of questionable research practices:
“It is not hard for scientists, including criminologists, to get whatever research findings they want—evidence that a criminal justice policy or program is effective, support for a favored theory or new hypothesis, statistical significance for a surprising interaction effect (Ritchie, 2020; Sweeten, 2020). Sufficient use of questionable research practices (QRPs) (Simmons, Nelson, & Simonsohn, 2011) will do the trick. QRPs represent the steroids of the scientific publishing game because they artificially boost researchers’ performance (i.e., their ability to produce exciting, and therefore publishable, results; Bakker, van Dijk, & Wicherts, 2012).” (Chin et al. 2023)
O’Boyle et al. (2017) labeled this ability of QRPs to transform “ugly” initial results into “beautiful” publications as the chrysalis effect. QRPs have the potential to rose-tint the literature and make it difficult to impossible to determine which findings are built upon the solid foundation of well-conducted and accurately reported research and which are built upon the sand of “white lies” and lies of omission (O’Boyle and Gotz, 2022)
But there are consequences to using QRP’s, right?
The typical responses to questionable research practices are to notify the researcher to rectify the issue, to call such bad practices out on social media, or to get the article corrected or pulled from the journal. Mostly, however, there is no response. Questionable research practices are mighty difficult to identify, thus work often gets past peer reviewers and editors that are the journal gatekeepers determining whether an article see’s the light of day (via publication in an academic journal), and other readers of the published research who may be even less able to identify the red flags that indicate questionable research practices have been engaged in (e.g., study is not pre-registered, unrealistic effect sizes, effect being ‘just’ significant, etc.). Other reasons for a lack of response is that questionable research practices are widespread in academia and dealt with poorly, and given those who call out questionable research practices often experience backlash from others in the academic community. So knowing that, why waste time calling it out and saying anything at all!
Critically, though, researchers that engage in questionable research practices very rarely receive the type of consequences recieved by the likes of Diederik Stapel or Woo-Suk Hwang. 1 This is the case even though the negative effects of questionable research practices used in any given article are often more impactful than those resulting from fraud, and that the collective impact of questionable research practices is definitely much much greater. So how can this be? That when researchers make up their data from scratch, or manipulate their collected data - both fraudulent actions - they have their careers ruined and are lambasted in mainstream and social media. But if researchers engage in practices like lying in their published manuscript about what findings they initially expected, or torture their data until it yells out a significant result but then keep quiet about all the other effects investigated - the likes of which are collectively more problematic - generally there are no consequences?
Why the consequences of fraud and QRP’s may diverge In my opinion, the divergence in consequences between fraud and questionable research practices in large part comes down to researchers guilty of fraud having very little excuse open to them to otherwise justify their actions. For instance, Stapel published research with statistics that were impossible given the sample size, and was found to have falsified data received from research assistants that entered the data into a spreadsheet before passing it on to PhD students to analyze. William Summerlin, as already highlighted, was found to have coloured in white mice with a black marker pen to pass them off as black. Mart Bax signed off to his academic employer that he authored 161 publications. However, 64 were later found to not actually exist. And it was calculated that there was an infitisimally small chance (1 in about 150 million) that Yoshitaka Fujii, a Japanese researcher in anesthesiology, would have got the data he got across his 168 publications had he engaged in the typical data collection process and not engaged in data falsification. As we can glean, these fraudsters had few to no viable excuses available to them to explain away these oddities.
In contrast, when a researcher leaves out mentioning a dependent variable that was part of their confirmatory analyses, applies numerous outlier criteria until they achieve significance, or shifts from a two-sided to a one-sided statistical test to pull their nonsignificant finding beneath the significance threshold, they can always fall back on the position that they were only ever interested in identifying whether an effect exists in a single direction (a justification for employing a one-sided test), that they wanted to provide a clean manuscript to assist the reading experience (a justification for leaving out that they measured a ‘problematic’ dependent variable), or that the chosen outlier criteria is in their opinion most optimal theoretically (a justification of why they removed extreme data). In other words, in most cases questionable research practices, particularly in lieu of any study pre-registration (where researchers state in advance expected what they are going to do, and how), can be justified away as done with the best of intentions.
However, another key reason we treat data fabrication and falsification markedly different from questionable research practices is arguably the normative academic culture that implicitly tells us we should do this. Its only when we really give the normatively understood distinction some thought, by pulling it apart and seeing what we’re left with, and putting the behaviors and their consequences side by side, that the distinctions between fraudulent and ‘questionable’ blur. Stapel made up data to support his theories because real data did not provide theoretical support. But many researchers even today adapt, analyze, or discuss their data in ways that they know ‘markets’ the effect way beyond its rightful place.
Interestingly, researchers that engage in questionable research practices are generally by default appraised as if they engaged in them to achieve virtuous ends. Does this mean the cases where questionable research practices were used were accidents, done with the best intentions in mind? In my view, it does not. Certainly very many cases do represent accidents or a failure to fully grasp the practices magnitude. This is unavoidable given the range of skills academics must learn to do good science, and the lack of time available to gain such skills. However, a sizeable number of researchers surely know the real impact of their actions, particularly given the widespread attention brought to these issues in the scientific domain in recent years. These are researchers that know their actions embellish or warp the investigated effects, but they do it anyway to better ensure publication, publication in a high-end outlet, or the ability to report an unintuitive, novel, or catchy finding.
It may seem like I am perhaps driving at the point that fraudsters should be cut some slack. Certainly nothing could be further from the truth. Rather, I’m simply calling attention to the super lenient attitude we seem to have towards questionable research practices which ensure the scientific literature continues to be filled with junk. Surely, if we really care about better understanding reality, and about making sure tax-payers money is spent wisely, something has got to change on the consequences side, right?
Anyway, next time you are exposed to the latest news article about fraud in science, be sure to keep in mind that the problems plaguing science are actually much worse and widespread than those articles will likely make out, in large part due to the prevalence of questionable research practices. But also keep in mind that efforts are ongoing to alleviate this issue. For instance, now many journals require that research articles are pre-registered (which inhibits the ability of researchers to lie about the projects real aim) and are accompanied by the raw data that underpins their findings (which enables their findings to be checked by others, or examined for questionable research practices). Journals are now also less likely to reject an articles publication simply because the findings were nonsignificant, and some demand researchers state explicitly in text that they have not engaged in some questionable researchers practices (e.g., not mentioned some measured variables). Hopefully these steps, which ultimately aim to disincentivize engagement in questionable research pratices, can be soon joined by other steps that may some day enable us to assess academic literature with a more trusting eye.
There are certainly exceptions. Brian Wansink engaged in questionable research practices on steroids (and was not found guilty of fraud), which led to many manuscript retractions. Email communication came to light showing he asked his collaborator to ‘squeeze some blood out of this rock’ (read, find some significant p-values). Following Wansinks orders, another collaborator conducted more than 400 different mediation models until it popped out a p < .05.↩︎