Somebody get Freakonomics’ Stephen Dubner on the phone, because it sounds like an enormous leap, but apparently it’s true.
In a study by U. Penn’s Johannes Eichstaedt, it seems that certain tweets are correlated with people having an increased-risk of heart problems not of the OKCupid variety.
What the researchers did was to pull in just under one billion tweets made from 2009-2010, and then filter them for both standard language of Stress or Anger, and also upbeat feelings.
They sifted through the data, and then compared it to local health and mortality stats via the CDC about the 1400 US counties from which the tweets originated, covering about 90% of the country.
What they discovered was that the given language was a pretty good predictor of atherosclerotic-type heart-disease and cardiovascular risk.
How the *&$@-damned #&$^%ing &^%#@ing %$&*! angry language predicts heart disease better than Age, Smoking, Obesity, Hypertension and about 7 more combined Leading-Indicators is both amazing and mysterious.
But they tested at least those 2 models, and somehow for Atherosclerotic-type heart-disease, the Twitter model was Significantly better.
So it’s possible this particular type may be more dependent on negative psychology or experiences.
(btw, the psych association Has been shown before, most clearly here, and here, but never with social language indicators)
Furthermore, it turns out that Positive language when tested, although helpful, was not nearly as good an indicator as Negative. This is oddly-similar to the way most people’s psychology will also focus on the 1 negative thing they hear instead of the 10 positive ones, because it presents a risk.
And since it’s a correlation and not causation, the How+Why explanation has to get at least a little creative.
The typically obvious-once-you-explain-it theory goes something like this:
What the stressed-out language in tweets does is to reflect not only a certain amount of stress in individuals, but also barometrize their immediate-environments and even greater communities.
This was jarringly-evident because the individuals doing the tweeting weren’t the ones suffering from the problems, and were well below the age-bracket where they typically crop up.
So, it would be a pretty big leap to turn things around the other way and say predictively that someone’s Grandpa is going to have a heart-attack because a young whipper-snapper is stressed-out nearby, but for now at-least the county-level correlation seems quite strong.
Stay-Tuned for more Dubnerian big-data social media studies predicting increasingly-Freaky health-stats and outcomes. I mean if stock-traders can use twitter for market signals, why can’t bigger analytics extract more truths?
Check out the gritty-details over at the Links:
“Wireless 2”, by Gabriella Fabbri
“Stress”, by Bernard Goldbach
• Source: Penn Twitter AHD Study
• via: NewYorker
• More Coverage: NYM-ScienceOfUs | Washington Post | AnalyzeWords Twitter Personality Analyzer | Microsoft Twitter Study On Depression | Framingham Heart Disease Risk Factors
• Source Study: PsySci-Psychological Language on Twitter Predicts County-Level Heart Disease Mortality