How the internet and its bots are sabotaging scientific research
There was a time, just a couple of decades ago, when researchers in psychology and health always had to engage with people face-to-face or using the telephone. The worst-case scenario was sending questionnaire packs out to postal addresses and waiting for handwritten replies.
So, we either literally met our participants, or we had multiple corroborating points of evidence that indicated we were dealing with a real person who was, therefore, likely to be telling us the truth about themselves.
Since then, technology has done what it always does — creating opportunities for us to cut costs, save time and access wider pools of participants on the internet. But what most people have failed to fully realise is that internet research has brought along risks of data corruption or impersonation which could be deliberately aiming to put research projects in jeopardy.
What enthused scientists most about internet research was the new capability to access people who we might not normally be able to involve in research. For example, as more people could afford to go online, people who were poorer became able to participate, as were those from rural communities who might be many hours and multiple forms of transport away from our laboratories.
Technology then leapt ahead, in a very short period of time. The democratisation of the internet opened it up to yet more and more people, and artificial intelligence grew in pervasiveness and technical capacity. So, where are we now?
As members of an international interest group looking at fraud in research (Fraud Analysis in Internet Research, or Fair), we’ve realised that it is now harder than ever to identify if someone is real. There are companies that scientists can pay to provide us with participants for internet research, and they in turn pay the participants.
While they do have checks and balances in place to reduce fraud, it’s probably impossible to eradicate it completely. Many people live in countries where the standard of living is low, but the internet is available. If they sign up to “work” for one of these companies, they can make a reasonable amount of money this way, possibly even more than they can in jobs involving hard labour and long hours in unsanitary or dangerous conditions.
In itself, this is not a problem. However, there will always be a temptation to maximise the number of studies they can participate in, and one way to do this is to pretend to be relevant to, and eligible for, a larger number of studies. Gaming the system is likely to be happening, and some of us have seen indirect evidence of this (people with extraordinarily high numbers of concurrent illnesses, for example).
It’s not feasible (or ethical) to insist on asking for medical records, so we rely on trust that a person with heart disease in one study is also eligible to take part in a cancer study because they also have cancer, in addition to anxiety, depression, blood disorders or migraines and so on. Or all of these. Short of requiring medical records, there is no easy answer for how to exclude such people.
More insidiously, there will also be people who use other individuals to game the system, often against their will. We are only now starting to consider the possibility of this new form of slavery, the extent of which is largely unknown.
Enter the bots
Similarly, we are seeing the rise of bots who are pretending to be participants, answering questions in increasingly sophisticated ways. Multiple identities can be fabricated by a single coder who can then not only make a lot of money from studies, but also seriously undermine the science we are trying to do (very concerning where studies are open to political influence).
It’s getting much more difficult to spot artificial intelligence. There was a time when written interview questions, for example, could not be completed by AI, but they now can.
It’s literally only a matter of time before we will find ourselves conducting and recording online interviews with a visual representation of a living, breathing individual, who simply does not exist, for example through deepfake technology.
We are only a few years away from such a profound deception, if not months. The British TV series The Capture might seem far-fetched to some, with its portrayal of real-time fake TV news, but anyone who has seen where the state of the art now is with respect to AI can easily imagine us being just a short stretch away from its depictions of the “evils” of impersonation using perfect avatars scraped from real data. It is time to worry.
The only answer, for now, will be to simply conduct interviews face-to-face, in our offices or laboratories, with real people who we can look in the eye and shake the hand of. We will have travelled right back in time to the point a few decades ago mentioned earlier.
With this comes a loss of one of the great things about the internet: it is a wonderful platform for democratising participation in research for people who might otherwise not have a voice, such as those who cannot travel because of a physical disability, and so on. It is dismaying to think that every fraudster is essentially stealing the voice of a real person who we genuinely want in our studies. And indeed, between 20–100% of survey responses have been found as fraudulent in previous research.
We must be suspicious going forward, when our natural propensity as amenable people who try to serve humanity with the work we do, is to be trusting and open. This is the real tragedy of the situation we find ourselves in, over and above that of the corruption of data that feed into our studies.
It also has ethical implications that we urgently need to consider. We do not, however, seem to have any choice but to “hope for the best but assume the worst”. We must build systems around our research, which are fundamentally only in place in order to detect and remove false participation of one type or another.
The sad fact is that we are potentially going backwards by decades to rule out a relatively small proportion of false responses. Every “firewall” we erect around our studies is going to reduce fraud (although probably not entirely eliminate it), but at the cost of reducing the breadth of participation that we desperately want to see.
Mark Forshaw, Professor of Health Psychology, Edge Hill University
Jekaterina Schneider, Research Fellow of Sport Psychology, University of the West of England
This article is republished from The Conversation under a Creative Commons license. Read the original article.