Oreo, Tumblr and a Network's Power to Amplify

Really, it was bigger than Oreo.

When Nabisco posted an image supporting gay pride, Tumblr blew it up. Users took the statement of a single snack manufacturer and made a cause that touched many companies.

In this, the second part of a trilogy, major brands find themselves roped to a conversation about love in America. Part one talked about how Oreo cannonballed into the social web by posting an image of a rainbow Oreo in support of gay pride. Part three will use the episode to highlight conversation dynamics unique to the Tumblr network.

It began with maskedman.

“Gay oreo? Oreo suppoert Gays/??” the user wrote, “Never evating cookie again. … Disgustedng. THis is AMERICA, not HOMERICA.”

The post, which would ultimately accumulate some 1,500 notes, landed a day after Oreo’s image and touched off a wave of support for the company.

One user, palahniukandchocolate, made a list.

“Dear people boycotting Oreos for supporting gay rights: The following companies also support gay rights,” she wrote, adding the names of 37 companies, among them Allstate, Gap, Nike and Starbucks.

A day later, monkaroo retooled the tactic:

“Yes, please boycott Oreo for their support of gay rights,” monkaroo wrote before invoking two dozen companies aligned with Oreo, “We’ll all appreciate you going on a diet … [D]o us all a favor, don’t take it all out on a festive cookie… Just stay home and boycott everything.”

The note from palahniukandchocolate ran close to 900 characters. monkaroo’s topped out over 1,800. Together, they used the freedom of Tumblr’s platform to find a community in an ideology. They grabbed allies — and by doing so, they blew up the question.

The notes caught.

By the evening of the 26th, palahniukandchocolate’s message was pulling down hundreds of reblogs per hour. Indeed, that night, the note would lay claim to 75 percent of Tumblr’s Oreo conversation.

Graph Showing Oreo Mentions Spike on Tumblr

Figure 1 presents hourly Tumblr activity about Oreos (blue) and hourly reblogs of user palahniukandchocolate (orange).

The action spread elsewhere. Starbucks had seen a median 11 tumbles per hour in the two weeks leading up to the 24th. Pepsi had seen 14. On the night of the 26th, palahniukandchocolate lifted both brands, driving each to a network peak of more than 400 posts per hour.

Microsoft also bounced, rising to the 400 peak from 15 posts per hour and holding triple digits as late as the afternoon of the 29th. Costco, with barely a pulse on the network the week before, found itself in 7,100 tumbles the day after the cookie.


Figure 2 presents hourly Tumblr activity around Costco, McDonald’s, Microsoft, Pepsi, Sears and Starbucks. Association with Oreo’s pride cookie drove heightened activity for each brand.

palahniukandchocolate named 37 brands in her defense of Oreo. For most, including Coca-Cola, Levi’s,  Nike and Walgreen’s, that single association dominated the brand’s Tumblr presence in the second half of June.

Tumblr’s platform made that possible. Figure 3 shows four brands that bounced on Tumblr thanks to the Oreo affair. None saw pickup on Twitter in the wake of the image — the platform has no room for periphery.

Graph Showing Cookie Brand Mentions on Tumblr
Figure 3 presents hourly Twitter volumes for four brands that popped on Tumblr in the wake of Oreo’s image. Microsoft’s acquisition of Yammer drove the brand’s heightened activity pictured here.

In part, it’s not surprising that the Oreo story could cast so long a shadow over so many brands. Tumblr’s largely an extraprofessional platform; presence on the network requires personal connections between users and brands. Figure 3 presents average daily Tumblr volumes for corporate titans. The flows are thin, technology superbrands notwithstanding.
Graph of Brand Activity on Tumblr

Figure 4 presents average daily Tumblr activity around a subset of the 50 largest corporations by market capitalization (ranked Aug. 18, 2012).

Brands with little network presence risk leaving definition in the hands of others. And Tumblr encourages association: The platform provides flexibility in media and speeds the replication of conversation.

The series’ last installment dives into conversation dynamics on the network. If you like trace diagrams, this next one’s for you.

Twist, Lick, Dunk: A Tumblr Story

Oreo Showing Pride

Tumblr won’t soon forget the day America’s favorite cookie came out.

On June 25th, to promote the year of Oreo’s 100th birthday, Nabisco lent its cookie some currency: The company tweeted the image of a six-layered cookie, with crèmes the color of the rainbow, above a simple caption – “Pride.”

“We feel the Oreo ad is a fun reflection of our values,” a Kraft spokesman later told reporters. The cookie, the company said, illustrated ‘in a fun and playful way’ an issue that was making history.

The image lit up the social web. This post, and two that follow, explore conversations on Tumblr through the lens of Oreo. Part Two looks at how the episode touched other brands on the network. Part Three dives into the dynamics of Tumblr conversations and how they diverge from other platforms.

The image itself touched a vein. Opponents to marriage equality took to Oreo’s accounts on Facebook and Twitter to slam Nabisco and threaten boycott.

“[U]nliking oreo, cleaning out cupboard, changing buying habits, no more Oreo’s, and it’s parent company,” one user wrote.

“I will never eat an oreo again! ew!” said another.

Those comments, and others, drew counter-protests, among them:

“[W]onderful job Oreo on supporting equal rights, just for that, now I’ll buy a pack today.”

“I believe I’m going to go buy every package of Oreos I see when I go grocery shopping. Kudos!!”

Within hours, Oreo found itself the subject of some 7,500 tweets. The conversation ramped to midnight EST, when the brand was pulling back some 2,000 tweets per hour.
Graph Demonstrating Twitter Volume Around Pride Oreo
Figure 1 shows hourly Twitter volumes around Oreo between June 18 and July 2.

Tumblr followed on the 26th. In three hours that night, the company drew more than 300 textual posts on the network, double what the brand had done each day the week before.

The talk stayed political: “Way to go Kraft!,” one post read, “However it is also eye-opening to see how many people are proud to show their hate, or belief that all Americans do not deserve equal rights.”

Graph Showing Tumblr Volume Around the Pride Oreo
Figure 2 shows hourly Tumblr volumes around Oreo between June 18 and July 2.

By then, the story had spilled. ABC, NBC, Reuters and the Washington Post amplified news of the flap. A conservative family group urged supporters to look elsewhere for cookies. Meanwhile, the image was slowly amassing more than 60,000 Facebook comments and close to 300,000 likes. Two social analytics companies would later call that conversation overwhelmingly positive – for Oreo.

For days on Tumblr, the story echoed. Median hourly Twitter volumes had returned to normal by the fracas’ fourth day. But on Tumblr, a full week after Oreo’s image went live, chatter remained triple the cookie’s prior volume.

In that way, the image marked a breakthrough for Oreo on Tumblr. At peak, the pride cookie generated 2.6 times Oreo’s median Twitter volume from the week prior. For Tumblr, that figure was 19.8.
Graph Demonstrating Increase in Tumblr Traffic After the Pride Oreo
Figure 3 shows the ratio between hourly platform volume around Oreo and typical hourly platform volumes between June 18 and July 2.

Oreo had long been a social brand. Before the pride cookie, it counted 26 million Facebook fans and tens of thousands of Twitter followers. On Tumblr, the cookie already outstripped its rivals. And in a move that may help the company retain that lead, Oreo can rely on oreodailytwist.tumblr.com, the brand’s official Tumblr presence. Its first posted image? June 25 – the pride cookie.

Graph Showing Oreo Compared to Other Cookie Brands on Tumblr

Figure 4 shows Oreo’s Tumblr lead over major cookie brands in the United States between June 18 and July 2.

But Oreo’s Tumblr story rippled beyond the cookie alone. That broadening – a central quality of the Tumblr platform – has implications for brands linked by product, demographic or, in this case, ideology. Return for more in Part Two.

Taming The Social Media Firehose, Part III – Tumblr

In part I, I discussed high-level attributes of the social media firehose. In Part II , I examined a single event by looking at activities from four firehoses for the earthquake in Mexico earlier this year. In Part III, I wrap up this series with some guidelines for using unique rich content from social media firehoses that may be less familiar. To keep it real, I used examples from the Tumblr firehose.

Since the Twitter APIs and firehoses have been available for years, you may be very familiar with many analysis strategies you can apply to the Twitter data and metadata.  I illustrated a couple of very simple ideas in the last post. With Twitter data and metadata, the opportunities to understand tweets in the context of time, timezone, geolocation, language, social graph, etc. are as big as your imagination.

Due to the popularity of blogging for both personal and corporate communication, many of you will also understand some of the opportunities of the WordPress firehose.  With the addition of firehoses of comments, you have the capabilities of connecting threads of conversation to realize another possible analysis strategy. “Likes” and Disqus “votes” provide additional hints about user reaction and engagement–yet another way to filter and understand posts and comments.

Why go to the effort and expense of adding a new firehose?
There are three benefits from investing your efforts in learning to integrate these differences. Users of social networks choose to participate in Twitter, Tumblr or other social networks based on their affinities and preferences. Integrating additional active social media sources gives:

  1. Richer audience demographics
  2. More diverse perspective and preference
  3. Broader topic coverage.

Here’s an example.

Tumblr

The newest firehose from Gnip became available earlier in 2012. Tumblr’s exciting because the unique, rich content from Tumblr provides a complementary perspective and a distinct form of conversation. Tumblr is important because of the unique audience and modes of interaction common within this audience and platform.

With a firehose of over 50 million new posts a day from web users, Tumblr is a source with strong social sharing features and an active network of users where discussions can reach a large audience quickly.  Some Tumblr posts have been reblogged more than a million times and stories regularly travel to thousands of readers in a couple of days.

Before jumping into consuming the Tumblr firehose in the next section, it may help to understand some of what makes it different and valuable. These questions provide a useful framework when approaching any unfamiliar stream of social data.

What is unique about the Tumblr firehose?

1. Demographics. The user community on Tumblr skews young, over-indexing strongly in the 18-24 demographic of trend setters and cool hunters.

2. Communication and Activity Style. As you are thinking about filtering and mining the Tumblr firehose, realize conversations on Tumblr are often quite different from what you’ll find on other social platforms. As you start to interpret the data from Tumblr it’s important to note that Tumblr has an inside language. For example, many sites contain f**kyeah___ in their name and URL. When you start to hone in on your topic, you will need to understand the inside language used for both positive and negative responses. Terms you consider negative on one platform may have positive connotations on another. Be sure to review a subset of your data to get a feel for the nuances before drawing larger conclusions.

3. Rich Content. Content is rich in that there many types of media and a wide range of depth. Users will post audio, video, animated gifs, simple photos as well as short and long text posts.

You’ll also see 7 different Post Types on Tumblr. These represent the different types of content that users can post on Tumblr. They break out as follows:

Table of Post Types on Tumblr

Table 1 – Tumblr post type breakdown.

To answer the questions, we often rely on filters based on text since these are the simplest filters to think about and create.  The textual data and metadata available in the Tumblr firehose include titles, tags and image captions in addition to the text of the body of the post. Including all of this content allows us to filter approximately 20% of the Tumblr firehose based on text. Additional strategies include looking at reblog and “like” activity, as well as reblog and “like” relationships between users.  More sophisticated strategies such as applying character or object recognition to images open up the tens of millions of activities daily for mining and exploration.

4. Rich Topics. In addition to diverse content forms, Tumblr has attracted many active conversations on a wide variety of topics. This content is often very complementary to other social media platforms due to differences in audience, tone, volume or perspective. With more than 20 billion total posts to date, there is content for about almost  anything you can imagine.  Some examples include:

  • Brands. Any brand you can think of is being discussed right now on Tumblr. Big brands with an official presence on Tumblr include Coca-Cola, Nike, IBM, Target, Urban Outfitters, Puma, Huggies, Lufthansa, Mac Cosmetics and many more. NPR and the President of the United States have their own presences on Tumblr.
  • Fashion and Cosmetics. Because of the visual nature of the medium and cool-hunting audience it attracts, there is a large volume of content related to cosmetics and fashion.
  • Music and Movies. With Spotify music plugins and easy upload and sharing of visual content, pop culture plays a big role in the interests and attention of many of the active users on Tumblr. Information, analysis and fan content is rich, creative and travels through the community rapidly.

5. Reblogs and Likes. Tumblr is all about engagement! The primary user activities for interactions are Reblogs and Likes. Some entries are reblogged thousands of time in a day or two. When a user reblogs a post, it places the other user’s post into your blog with any changes they make. There is a list of all of the notes (likes, reblogs) associated with a post appended to that post wherever it shows up on Tumblr. Each post activity record in the firehose can contain reblog info. It will have a count, a link to the blog this entry was a reblog of and a link to the root entry. To build the blog note list that a user would see at the bottom of a liked or reblogged entry, you have to trace each entry in the stream (i.e. keep a history or know what you want to watch) or scrape the notes section of a page.

Filtering and Mining The Tumblr Firehose

Volume. There are a number of metrics we can use to talk about the volume of the Tumblr firehose. The three gating resources that we run up against most often are related to the network (bandwidth and latency) and storage (e.g. disk space). Tumblr activities are delivered compressed, so for estimating, the bandwidth and disk space requirements can be based on the same numbers. The Tumblr firehose averages about 900 MB/hour compressed volume during peak hours, falling to a minimum of 300 MB/hour during slower periods of the day.

To store the firehose on disk, plan on ~16 GB/day based on current volumes. Planning for bandwidth, you want headroom of 2-5 x average peak hourly bandwidth (4 to 10 Mbps) depending on your tolerance for disconnects during peak events.

The other consideration is end-to-end network latency as discussed in Consuming the Firehose, Part II.  Very simplistically, latency can limit the throughput of your network (regardless of bandwidth) by using up too much time negotiating connections and acknowledging packets. (For a detailed calculation, see, for example, The TCP Window, Latency, and the Bandwidth Delay Product.)  The theoretical limit for 20 Mbps throughput is 50-70 ms (depends on TCP window size), but practically you will want to reliably observe less than this (< 50 ms) to realize reliable network performance.

Metadata. A firehose is a time-ordered, near real-time stream of user activities. While this structure is clearly powerful for identifying emerging trends around brands or news stories, the time-ordered stream is not the optimal structure for looking at other things like the structure social networks to discover, e.g., influencers. Fortunately, the Tumblr firehose activities contain a lot of helpful metadata about place, time, and social network to get answers to these questions.
Each activity has a post objectType as discussed above as well as links to resources referred to in the post such as image files, video files and audio files. Each activity has a source link that takes you back to the original post on Tumblr. If the post is a re-blog, it will also have records like the JSON example below, describing the number of reblogs, the root blog and blog this post reblogged.

"tumblrRebloggedFrom" :
    {
         "author" :
         {
               "displayName" : "A Glimpse",
               "link" : "http://onlybutaglimpse.tumblr.com/"
         },
         "link" : "http://onlybutaglimpse.tumblr.com/post/24141204872"
    },
"tumblrRebloggedRoot" :
    {
         "author" :
         {
                "displayName" : "Armed With A Mind",
                "link" : "http://lizard-skin.tumblr.com/"
         },
         "link" : "http://lizard-skin.tumblr.com/post/16004808098/the-nautilus-car-from-the-league-of-extraordinary"
    },

To assemble the entire reblog chain, you must connect the reblog activities within the firehose using this metadata.

Additional engagement metadata is available in the form of likes (hearts in the Tumblr interface) in a separate Tumblr engagement firehose.

Tumblr Likes Metadata

Non-Text Based Filters. Not all non-text post types have enough textual context (captions, title and tags) to identify a topic or analyze sentiment through simple text filtering. You will want to develop strategies for dealing with some ambiguity around the meaning of posts with very little text content. This ambiguity can be reduced unless you have audio or image analysis capabilities (e.g. OCR or audio transcription). Approximately 20% of all posts can be filtered effectively with text-based filtering of text, URL text, tags and captions–about 15M activities per day).

Memes. Another consideration related to the Tumblr language is that official brand sites as well as many bloggers tend to promote a style or overall image more than providing a catalog of particular products. As a result, e.g., you will match the brand name with a lot of cool stuff, but may see specific product names and descriptions much less frequently. There are many memes within Tumblr that will lead you to influencers and sentiment, but looking at “catalog” terms won’t be the most effective path.

I hope I have uncovered some of the mysteries of successfully consuming social media firehoses.  I have only suggested a handful of questions one might try to answer with the social media data. The community of professionals providing text analysis, image analysis, machine learning for prediction, classification and recommendation, and many other wonders is continuing to invent and refine ways to model and predict real-world behavior based on billions of social media interactions.  The start of this process is always a great question.  Best of luck (and the benefits of all of Gnip’s experience and technology) to you as you jump into consuming the social media firehose.

Full Series:

Taming The Social Media Firehose, Part I – High-level attributes of a firehose

Taming The Social Media Firehose, Part II – Looking at a single event through four firehoses

Taming The Social Media Firehose, Part III – Tumblr

 

Tumblr Firehose Now Available Exclusively from Gnip

I’m thrilled to announce that the full firehose of public Tumblr posts is now available exclusively from Gnip. Tumblr is one of the fastest growing social networks in the world. Much of this growth is fueled by the enormous number of conversations that are unique to the Tumblr community. These conversations cover a huge range of subjects, from movies, TV shows and fashion to business, apparel and consumer products. Check out these stats to get a feel for the volume of discussion on Tumblr:

  • 50 million new posts every day
  • 15 billion page views every month
  • 20 billion total posts
  • 300% traffic growth last year

While some social platforms react quickly to news and other events, Tumblr conversations often spread around concepts and trends. Take the example of Urban Outfitters where a photographer posted a picture to her personal Tumblr of a piece from one of their new collections. That post received over 1,000 notes and almost no mention elsewhere. In the case of Land Rover, the company posted a picture of a dog riding in a Land Rover to their Tumblr that received more than 5,000 notes and very little mention on other networks.

It doesn’t take a large leap to see the impact this type of information can have on brand management and product development. The conversations on Tumblr are rich in images and discussion about brands and products, from simply sharing a picture about a favorite pair of shoes to reblogging news about favorite brand. And given the highly social nature of the Tumblr community, these discussions move quickly and broadly through the community. You often see posts that are shared tens of thousands of times. For brands, every conversation matters and access to the full firehose ensures they won’t miss a thing.

We’re excited to be able to offer Tumblr to our customers and can’t wait to see what other intriguing use cases they find for this data.

Drop us a line at sales@gnip.com to learn more.