SES New York Keynote with Duncan Watts
Good morning and welcome to New York! We’re in the Grand Ballroom for the opening of SES New York 2011. As I mentioned in our liveblogging guide, I haven’t been to this conference for six years. First impressions, Hilton’s wifi sucks. The coffee is pretty tasty. There are no tables or outlets for the livebloggers. I’ve been spoiled by the West Coast.
Our keynote speaker is Duncan Watts, Principal Research Scientist for Yahoo!. If the big screen can be trusted, his topic will be Using the Web to Do Social (Media) Science.
Mike Grehan introduces the conference and welcomes us to New York. Thanks, Mike! He plugs SEMPO, bashes PageRage, makes jokes and it’s all a good time. Finally, he introduces Duncan Watts and we’re off!
Duncan starts off with a trip to the 1940s to Lasswell’s Maxim: “The key to understanding what’s going on in communications science is ‘who talks to whom about what, through which channel and to what effect?’”
This is a very difficult task though it seems simple on the surface. He thinks that only now, 70 years later, are we getting to the point where we might be able to begin to answer it. Instead, he’s going to cover 4 projects dating back to 2001 that answer part of Lasswell’s question.
Six Degrees of Separation
Based in the 1960s experiment of a single “target” in Boston and 300 other individuals who were “senders”. Each had to get the packet to him but only directly if they knew him on a first name basis. About 64 packets reached the target though about 6 connections.
In 2001, they tested it on a larger scale. They chose 18 targets around the world and 24000 people sending packets. It passed through 166 countries and over 60,000 people. 400 reached the targets. The median chain was 7 people.
This lead to the discovery of the “Bored at work” principle. All you need is something that is vaguely entertaining for about five minutes without making noise and you can get them to do
Success in Cultural Markets
Cultural markets are books, music, arts, etc. Things that we value but not in a quantifiable way. “Hits” in cultural markets are many times more successful than average. Success seems obvious in retrospect but it’s hard to predict.
They created a music lab with 48 unknown bands. There were two “conditions”. The social information conditions – each band had a visible signal of how many times other people have downloaded the song. In that conditions there are 8 “worlds” which only count on information collected in that world. In the independent condition, you only got song name and title, no download information.
They did it four times with different conditions, first with teens and then with the adults from the previous six degrees experiment as well. The “strong” test correlated rankings with order, the “weak” did not. The last experiment flipped the order entirely making the best songs seems the least popular.
They discovered that individuals are influenced by their observations of the choices of others. The stronger the signal, the more they are influenced. The more information you give them at the individual level, the less they’re influenced by the crowd but the collective choice reveals less and less about what the individual prefers. You can create self-fulfilling prophecies for a song but not for an entire market.
The trouble with this experiment was that it was linear. You could only be influenced by what came before, there was no social interactions.
Then came Twitter, which is ideally suited as a fully-observable network of “who listens to whom”. It includes many types of “actors”
- CNN, NYTimes
- Governments and Fortune 500
- Celebrities, bloggers, journalists, experts
- Ordinary individuals
And it has URL shorteners like bit.ly which allow you to see information flows.
Classifying users with Lists
How do you categorize the types of users on Twitter? 2009, Twitter introduced Lists in order to help people filter their feeds according to popular topics. Watts and co treat lists as crowd-sourced labels for users who appear on them. They focus on four categories of “elite users”: Celebrities, media organizations, bloggers, companies.
The conventional wisdom is that user attention has fragmented in all directions. What they found was that in spite of the fragmentation, about 50% of the tweets that people receive on Twitter come from 20K people. Celebrities outrank all other categories then media, then orgs, then bloggers.
Elite users are more active per-capita. However, ordinary users collectively introduce many more URLS. Media companies produce the most URLs per organization.
Bloggers retweet the most, celebrities almost not at all.
There’s a two step flow of information, theorized in the 1950s. Opinion leaders pass on information from mass media to the rest of society, instead of media delivering the information directly. They tried to see if this was the case with Twitter. 40% don’t listen to the media at all. Of the remaining 60%: 46% came indirectly. It’s a huge distribution of intermediary. Ashton Kutcher is the largest intermediary, not just because he has so many users but because he retweets more than more followed celebs like Lady Gaga.
- Tweet more often
- Have more followers
This is consistent with the Two-Step Theory.
Striking concentration of attention on Twitter but it’s impossible to tell what the “effect” is based on their discoveries so far. So, they took another step to
Twitter Influence Project
Counted bit.ly URLs but only counted them if they were retweeted. Most are not passed on and almost all cascades are small and shallow. A tiny fraction are large and propagate up to 8 hops. The largest cascades get tens of thousands of retweets.
Next they used a random sample of 800 bit.ly URLs by content type, category, interestingness. Split the two months of data they had and stuck themselves in the middle. First month became the past, then they tried to predict the “future” based on it. They were able to do pretty well on average, but that’s a big caveat. On an individual level, it was a random scatter.
Two factors that seem to matter most
- past local influence
- # followers
What doesn’t matter? Interestingness, volume of tweets.
However, you can’t just trigger a cascade by targeting influential Twitter users. Most cascades that influential start don’t go anywhere. It’s necessary but not sufficient. You should give up on predicting individual event. Instead focus on the typical event size and try to optimize that.
Should Kim Kardashian be paid 10k to tweet or can you duplicate it with a broader, less influential base? It depends on how much acquiring the broader base costs. In most cases, the broader base is better unless the cost is very high.
Everyone’s an influencer (.pdf)
Large cascades are rare, hence, it’s probably impossible to predict them or how they will start and it’s better to trigger many small cascades.
Each of these experiments has part of the question from the beginning but it’s still not all put together. Exp 1 showed how large networks are connected, exp 2 showed how social influence drives popularity and unpredictability, Twitter studies show that attention is highly concentrated but influence is still hard to predict at an individual level.
His book will be out next week: Everything is Obvious. Someone buy it for me.
There’s a question from the audience about negative influence but Duncan says it’s not something they’ve studied yet.