Big data, as we all know, is going to save us. Used well it will make health systems solvent, introduce immortality, cool the planet, bring peace in the Middle East, and create a global pandemic of love, health, and happiness. But not yet—as we are still at the beginning. For now its achievements are more prosaic, but Imperial College’s Big Data and Analytical Unit (BDAU) Data Summit heard last week how big data can deepen our understanding of the NHS and the junior doctors’ dispute.
Twitter with its 955 million users and 6000 tweets a second is an instant source of big data. Sunir Gohil, a plastic surgeon, is doing a PhD on how health systems can learn from Twitter. He plans to gather data from Twitter on how the 172 NHS trusts are performing.
Some 80% of tweets come from mobile phones, and people whose appointment in Despond NHS Trust is delayed three hours may tweet “Sick and tired of waiting for these doctor bastards. They couldn’t run a whelk stall.” Meanwhile, the delighted patient in Rapture NHS Trust might Tweet: “Doctor understanding, nurses delightful, food delicious. Full marks to Rapture.”
Gohil will develop algorithms to find the tweets from the trusts and then conduct a “sentiment analysis” to see how many are positive and how many negative. Marketeers are already doing this for many products, but it’s not yet done for health.
The great advantages of Twitter are, he pointed out, that it’s cheap, fast, real time, mobile, open, targeted, viral, and two way. The cycle of Twitter is to begin with listening, then participate by adding your messages, and finally engage—responding to others, beginning conversations.
Every time his train is late Gohil tweets the train company saying “When will my train arrive?” “Never” answers the wit at the other end, who is probably promptly fired. NHS trusts might respond more positively to the man stuck in the waiting room: “The doctor is saving a life. We invite you for a free coffee and copy of Hello magazine.”
To illustrate the power of Twitter for his talk, Gohil gathered data on the junior doctors’ dispute. He can access every one of the 6000 tweets tweeted each second, and he developed an algorithm that allowed him to detect tweets mentioning the dispute—using hashtags and particular words. There will, of course, be false positives and false negatives, but the huge size of the database probably means they don’t matter much and won’t introduce bias.
He began by plotting the number of tweets and re-tweets against time, and there were big spikes on the day of the junior doctors’ march and when the BMA announced the results of the strike ballot. The “sledgehammer to nut” ratio of this wholly unsurprising and uninteresting finding is huge, but I was interested that re-tweets far outnumbered tweets: most people repeat the views of others rather than give their own views, which is perhaps simply the way of the world.
Next Gohil identified that some 45 000 people had tweeted on the dispute. How many, he wondered, were doctors? Using an algorithm that assumed people were doctors if they said they were in their profile or mentioned that they were, for example, an anaesthetist or something similar, he calculated that about 30% were doctors. So a high proportion of doctors have joined the debate on Twitter, and by looking at when they first tweeted Gohil could show that many of them had joined Twitter recently, presumably to make their views on the dispute heard.
But the most interesting piece of information would be to know the views of the non-doctors who had joined the debate. Are they mostly supportive or critical? The public’s view on the dispute will be an important determinant, probably the main determinant, of the outcome. Analysis of tweets would allow junior doctors to know in real time whether the public is with or against them. Gohil hasn’t done this yet, but perhaps he might in time for when the strikes might begin.
I tried my own “small data analysis” of views on the junior doctors’ dispute by typing “junior doctors” into the search box on my Twitter site. This proved to be hopeless: not only was every tweet supportive of junior doctors but every one came from people I personally know—and I knew their views without ever having to resort to Twitter.
The failure of my small data experiment illustrates why we need big data to reach beyond the “usual suspects” and “chattering classes” (both of which terms apply to me) to find out what “real people” think—but perhaps real people don’t tweet.
Competing interest: RS (@richard56) was early Tweeter, joining Twitter in 2008. He has Tweeted 10 900 times, follows 266 people, and has 6171 followers, annoyingly less than his brother (@ArfurSmith 40 000), his son (@FredSmith_ 6475) and Trish Groves (@Trished 6747).
Richard Smith was the editor of The BMJ until 2004.