Social Data vs Social Media

One area I see a lot of confusion about is the difference between social media vs. social data. I come from a social media background and use social media in marketing, so I see where the confusion can come from.

The easiest way to think about it in plain English:

  • Social Media: User-generated content where one user communicates and expresses themselves and that content is delivered to other users. Examples of this are platforms such as Twitter, Facebook, YouTube, Tumblr and Disqus. Social media is delivered in a great user experience, and is focused on sharing and content discovery. Social media also offers both public and private experiences with the ability to share messages privately.

  • Social Data: Expresses social media in a computer-readable format (e.g. JSON) and shares metadata about the content to help provide not only content, but context. Metadata often includes information about location, engagement and links shared. Unlike social media, social data is focused strictly on publicly shared experiences.

Or otherwise boiled down, social media is readable by humans and made for human interaction while social data is social media that is readable by computers.

Let’s look at a Tweet in form of social media and social data to show exactly what I’m talking about.

From this Tweet from Gnip, we can visually see that it uses the #BigBoulder hashtag, a Bit.ly link to our Storify page, that it has 73 retweets and 3 favorites, the time and date of the Tweet.  

 

Now let’s take a look at what the architecture of a Tweet looks like when received from an API.


  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
{
   "body": "RT @gnip: Thrilled to welcome all #BigBoulder attendees! Watch the social
story unfold on our Storify page. http://t.co/ZzqUMfJz",
   "retweetCount": 71, 
   "generator": {
      "link": "http://twitter.com", 
      "displayName": "web"
   }, 
   "gnip": {
      "klout_score": 53, 
      "matching_rules": [
         {
            "tag": "old krusty tweet", 
            "value": "thrilled to welcome all attendees"
         }
      ], 
      "language": {
         "value": "en"
      }, 
      "urls": [
         {
            "url": "http://t.co/ZzqUMfJz", 
            "expanded_url": "http://storify.com/Gnip/big-boulder"
         }
      ]
   }, 
   "object": {
      "body": "Thrilled to welcome all #BigBoulder attendees! Watch the social
story unfold on our Storify page. http://t.co/ZzqUMfJz",
       "generator": {
         "link": "http://www.tweetdeck.com", 
         "displayName": "TweetDeck"
      }, 
      "object": {
         "postedTime": "2012-06-20T18:07:13.000Z", 
         "summary": "Thrilled to welcome all #BigBoulder attendees! Watch the social
story unfold on our Storify page. http://t.co/ZzqUMfJz", 
      "link": "http://twitter.com/gnip/statuses/215506104082366465", 
         "id": "object:search.twitter.com,2005:215506104082366465", 
         "objectType": "note"
      }, 
      "actor": {
         "preferredUsername": "gnip", 
         "displayName": "Gnip, Inc.", 
         "links": [
            {
               "href": "http://gnip.com", 
               "rel": "me"
            }
         ], 
         "twitterTimeZone": "Mountain Time (US & Canada)", 
         "image": "http://a0.twimg.com/profile_images/1347133706/
Gnip_logo-73x73_normal.png", 
         "verified": true, 
         "location": {
            "displayName": "Boulder, CO", 
            "objectType": "place"
         }, 
         "statusesCount": 971, 
         "summary": "Gnip is the leading provider of social media data for enterprise
applications, facilitating access to dozens of social media sources through a single
API",
         "languages": [
            "en"
         ], 
         "utcOffset": "-25200", 
         "link": "http://www.twitter.com/gnip", 
         "followersCount": 3335, 
         "favoritesCount": 108, 
         "friendsCount": 384, 
         "listedCount": 212, 
         "postedTime": "2008-10-24T23:22:09.000Z", 
         "id": "id:twitter.com:16958875", 
         "objectType": "person"
      }, 
      "twitter_entities": {
         "user_mentions": [], 
         "hashtags": [
            {
               "indices": [
                  24, 
                  35
               ], 
               "text": "BigBoulder"
            }
         ], 
         "urls": [
            {
               "indices": [
                  98, 
                  118
               ], 
               "url": "http://t.co/ZzqUMfJz", 
               "expanded_url": "http://bit.ly/MumrVJ", 
               "display_url": "bit.ly/MumrVJ"
            }
         ]
      }, 
      "verb": "post", 
      "link": "http://twitter.com/gnip/statuses/215506104082366465", 
      "provider": {
         "link": "http://www.twitter.com", 
         "displayName": "Twitter", 
         "objectType": "service"
      }, 
      "postedTime": "2012-06-20T18:07:13.000Z", 
      "id": "tag:search.twitter.com,2005:215506104082366465", 
      "objectType": "activity"
   }, 
   "actor": {
      "preferredUsername": "daveheal", 
      "displayName": "Dave Heal", 
      "links": [
         {
            "href": "http://daveheal.com", 
            "rel": "me"
         }
      ], 
      "twitterTimeZone": "Mountain Time (US & Canada)", 
      "image": "http://a0.twimg.com/profile_images/1755125722/photo_2_normal.JPG", 
      "verified": false, 
      "location": {
         "displayName": "Boulder, CO", 
         "objectType": "place"
      }, 
      "statusesCount": 5657, 
      "summary": "Boulder resident. Rochester NY native. Michigan Law graduate.
Copyright enthusiast. Liker of sports. DFW fanboy. CrossFitter. Work @Gnip. ",
      "languages": [
         "en"
      ], 
      "utcOffset": "-25200", 
      "link": "http://www.twitter.com/daveheal", 
      "followersCount": 671, 
      "favoritesCount": 28, 
      "friendsCount": 292, 
      "listedCount": 26, 
      "postedTime": "2009-03-02T01:18:39.000Z", 
      "id": "id:twitter.com:22432819", 
      "objectType": "person"
   }, 
   "twitter_entities": {
      "user_mentions": [
         {
            "indices": [
               3, 
               8
            ], 
            "id": 16958875, 
            "screen_name": "gnip", 
            "id_str": "16958875", 
            "name": "Gnip, Inc."
         }
      ], 
      "hashtags": [
         {
            "indices": [
               34, 
               45
            ], 
            "text": "BigBoulder"
         }
      ], 
      "urls": [
         {
            "indices": [
               108, 
               128
            ], 
            "url": "http://t.co/ZzqUMfJz", 
            "expanded_url": "http://bit.ly/MumrVJ", 
            "display_url": "bit.ly/MumrVJ"
         }
      ]
   }, 
   "verb": "share", 
   "link": "http://twitter.com/daveheal/statuses/215509188481253376", 
   "provider": {
      "link": "http://www.twitter.com", 
      "displayName": "Twitter", 
      "objectType": "service"
   }, 
   "postedTime": "2012-06-20T18:19:29.000Z", 
   "id": "tag:search.twitter.com,2005:215509188481253376", 
   "objectType": "activity"
}

This is social data. Same content, very different format, very different context and very different end user.

So what exactly does goes into the social data of a Tweet? To start, here is some of the metadata that you’re seeing.

  • Language identification — It is detected that the language of this Tweet is in English. Language identification is important for social media monitoring so companies can correctly monitor for the content they want.

  • URL expansion — Essentially this resolves or traces a shortened url to the end url that a consumer would see in their browser window. In this case, http://storify.com/Gnip/big-boulder is the link we shared using bitly.

  • Content — Gnip shows the full content of the Tweeted message, as well as metadata about the Tweet; like hashtags and URLs used, users that were mentioned, and when it was posted.

  • User — Gnip provides the display name, username, user’s stated location and additional bio information of the Tweeter. This is the information that users decide to share when signing up for an account.

  • Klout scores — An additional piece of metadata Gnip can provide is Klout score, so if one of our clients only wanted to see tweets with a Klout score of 30 or higher, they could do that.

Beyond Twitter data, Gnip offers social data from Tumblr, Disqus, Automattic (WordPress) and other publishers that all have their own unique metadata and enrichments. In addition to enrichments, Gnip offers format normalization. This means if you’re looking at a WordPress blog or a Tweet, the data is normalized no matter what the platform. E.g. date and location are formated and located in the same place within the JSON payload; making it easy to consume and parse data from multiple different sources.

Finally, a big difference is in how people use social data vs social media. Social data is what powers social media monitoring and analytics companies, it’s used in business intelligence to combine with other data sets, it’s used by hedge funds as part of their algorithms when looking at financial trades, or even to take a top-level look during a natural disaster.

Welcoming Estimize, Gnip’s Latest Premium Publisher

At Gnip, we’ve always had a theory that financial firms would be hungry for social data. What has happened has surpassed our expectations, though; we’ve seen an incredible hunger from firms wishing to use social data as a news source, a sentiment signal and a research set.

One of the ways we’ve measured the success of how this sector uses social data is by how often our customers ask for additional social data sources. One of the most consistent asks we’ve heard has been for for Estimize, a crowdsourced earnings estimates platform that provides open sourced financial estimates with incredibly transparency, making it a valuable and unique set of social data.

We’re excited to now be the exclusive provider of Estimize’s streaming data, delivering our trading customers yet another competitive edge driven by social interaction. Estimize has a community of 2,50 vetted analysts that create estimates that beat comparable Wall Street reports more than 67% of the time. In the short few years since Estimize has been founded they’ve become a force, and we believe this dataset- and the power of this dataset- will continue to increase substantially over time.

Watching how the financial industry has incorporated social data from StockTwits, Twitter and now Estimize is proving the utility of social data and we’re excited to be on the vanguard of that.

Data Story: Mohammad Shahangian on Pinterest Data Science

At Gnip, we believe the value of social data is unlimited. Data Stories is how we bring this belief to life by showcasing how social data is used. This week we’re interviewing data scientist Mohammad Shahangian of Pinterest about how the data science team works at Pinterest, surprising uses of Pinterest and data science as a career path. You can follow him on Pinterest at pinterest.com/mshahang

Data Scientist at Pinterest

1. What do you see is your role as the data scientist for Pinterest?

The company’s focus is on helping millions of people discover things they love and get inspiration to go do those things in their life. For me, that means analyzing the rich data that is created by the millions of people interacting with billions of pins from across the web each day. I evaluate this data and provide insights that make data actionable. My team also prototypes and validates ideas, performs deep analysis and builds tools that allow us to answer our most frequent questions in seconds. We work with every team to answer Pinterest’s biggest questions and ensure that each decision positively impacts Pinners over the long term.

For example, we take a business question like “How should our web, tablet and phone experiences differ?” and present the results as insights like, “Many users use the mobile apps in the morning and again at night, but prefer the website during the day” and “Users prefer to use mobile apps to casually discover new content, whereas they use the web to curate and organize content.” We then work with the design and product teams to build features around these insights and measure their impact.

2. What are some of your favorite ways that people use Pinterest that people wouldn’t expect?

What makes Pinterest unique is that it’s a tool and the users really define its use cases. For me, Pinterest was really helpful when I was planning my wedding and it made perfect sense to use as collaborative office shopping list. I would have never thought to use it as a tool for:

A collection of Stop signs from around the world
Daily Grommet gets their community to collaborate on a board to see things they want to sell
Vintage Driving - a collaborative board where users pin their favorite vintage cars:
GE Badass machines featuring GE tech
Madewell’s Rainbow board
Michelle Obama’s MyPlate Recipes encourages health eating
Stunning virtual collections of minerals and shipwrecks
The “365 Days of Pinterest” challenge. She made a Pinterest project every day for a year!
Sammy Sosa awesomeness
Sony shows off their technology with food pictures shot with a Sony Camera
Pantone announces the color of the year
The National Pork Board

3. What category do you see as the most viral on Pinterest?

DIY and recipes pins generally go viral year round. Around the holidays, holiday-themed content across all categories tends to get the most traction.

4. How has data science added value to Pinterest?

We have this internal value we refer to as “knit.” It means that we have an open, curious culture where everyone in different disciplines—from engineering and design to marketing to community—works together. Data science is at the core of that. The search, recommendations and spam teams apply data science to improve the quality of content we put in front of Pinners. This is only a subset of how we apply data though; most of the decisions we make at Pinterest are actually backed by data.

Data is a universal language that teams across the company use to collaborate and make decisions. Each team has a set of performance metrics, and we hold a weekly meeting to understand the impact that each area is having on company-wide metrics. As data scientists we do more than just analyze data, we create rich data sources that we make available to other teams so they can do their own analysis. More than half of Pinterest employees run MapReduce jobs via Hive.  Our metrics dashboards are accessible to everyone and our core metrics are emailed daily to the entire team.  We also share our data studies and insights with the whole team.

We also use data just for fun. During our weekly happy hour, we share a weekly Data Fun Fact with the team. We present the fact in the form of a multiple choice question and have the team vote on the answer. For example, we asked, “How many days before Valentine’s day does the query ‘Valentine’s day ideas’ increase the most: 1, 3, 5 or 7 days?” (Hint for the curious reader: two*three/two).

5. What do you think someone should know before becoming a data scientist at a major web company like Pinterest?

I would say go for it! If you are hungry to extract value from real world data, you’re really going to enjoy it. I know that for a lot of really talented people in academia the only thing standing between them and the opportunity to solve a really interesting problem is the lack of rich data. My experience at Pinterest has been the exact opposite. Our team can’t grow fast enough to tap into a world of valuable insights that are sitting dormant within billions of records somewhere in the cloud.

Continue reading

Commercial Evolution of Social Networks

Over the past four years Gnip has seen many social services come and go. Not surprisingly, a pattern has emerged in how they evolve, and the degree to which our customers need their public data. There are generally three distinct phases a social service goes through, and how the service does in each phase impacts how it ultimately participates in the broader public social data ecosystem which can complete a full commercial cycle. This cycle being one combining consumer use (often buying intent, or expression) with commercial engagement (identifying need in time of natural disaster, or ad buying).

Phase 1: Consumer Engagement
​A social service must engage us; the end-users/consumers. Whether via a homegrown social graph, or leveraging someone else’s (e.g. Facebook Connect), in order for a social service to become useful, it needs users. From there, those users need to participate in self-expression (from posting a comment, to retweeting a tweet) and generate activity on the service. There are a variety of ways to compel us users to engage in a social service, but the social service itself is solely responsible for the first experience. The vision of the services’ founders yields a web-app or mobile interface that allows us to take action, leveraging the expressions laid out by the app itself (e.g. sharing a photo). If users like the expressions, discovery methods, and sense of “connectedness,” you’ve got a relevant social service on your hands.

Phase 2: APIs; Outsourcing Engagement
At some point a successful social service realizes the potential for outsourcing the expression metaphors that make the service successful & useful, and they construct an API that allows others to RESTfully engage with the service. In some instances the API is read-only. In some instances the API is write-only; sometimes both. What is key is that nine times out of ten, the API is meant to drive core service engagement via other user-facing applications. A classic example of this would the zillions of non-Twitter Inc clients that “Tweet” on our behalves everyday. One look at the endless number of Tweet “sources” that flow through the Firehose and you’ll realize this engagement potential.

The exceptional API is one that has broader social data engagement ecosystem consumption in its DNA. Typical social services consider themselves the center of the universe, and that not only will they capture all consumer engagement, they will be the root of all broader ecosystem engagement as well. However, success with Consumer Engagement does not guarantee commercial engagement; not by a long-shot.

Some services execute phase 1 and 2 simultaneously these days.

Phase 3: Activity Transparency; Commercial Engagement
Allowing other applications & developers to inject activities into the core service is obviously valuable, however it is only part of the picture. Social services with broad social and commercial impact have achieved this by addressing commercial needs for complete, raw, activity availability. For example, in order for someone to deploy resources in a disaster relief scenario effectively, they need to make their own determination as to what victims need, where they are located, and general conditions surrounding the event. The social service limiting access to the activities taking place on the service, by definition, yields an incomplete picture to downstream commercial consumers of the content. The result is a fragmented & hobbled experience for commerce engagement.

Another key component to commercial engagement is realizing that the ecosystem of data analytics and insights is well established, complex, and interwoven. Massive investments have been made in the market over the years, and brands want to leverage that fact. It is illogical for a social service to address the endless needs of the enterprise by building their own tools. Attempts to supplement this market comes at the potential expense of losing focus on building a great consumer experience.

The most impactful, useful, and valuable social services that Gnip customers leverage for their needs (ad buying, campaign running, stock trading, disaster relief), are those that acknowledge that they are not an island in the ecosystem. They complete the cycle by providing unfettered access to one of their most significant assets. In trade, the relevance of the social service itself is maximized because commerce can engage with it.

A good example of how impactful this transparency can be is Twitter. Consider how Twitter is used across new, as well as traditional, media. They’ve completed the cycle with a strong offering of Phase 3.

All three phases are not required for success, but all three are indeed required for success in the broader public commercial social data ecosystem.

Data Stories: Brooke Fisher Liu on Using Social Media in Natural Disasters

Data Stories is Gnip’s project to tell the stories of how social data is being used. This week we’re interviewing Brooke Fisher Liu from the University of Maryland about her research on how people use social media in natural disasters (PDF). You can follow Brooke on Twitter at @Bfliu. (Also, you can see our data scientists post on Twitter’s reaction to an earthquake in Mexico.)

Brooke Fisher Liu

Brooke Fisher Liu (photo courtesy of Anne McDonough)

1. When the wildfires broke out in Boulder, I found Twitter to be the best source of information hands down. What kind of information do you see people communicating about natural disasters?

During natural disasters people tend to use social media for four interrelated reasons: checking in with family and friends, obtaining emotional support and healing, determining disaster magnitude, and providing first-hand disaster accounts. A consistent research finding is that people are less likely to follow official, government sources on social media than their friends and family during disasters. I think that may change over time as government sources become more savvy about effectively using social media during disasters.

2. How is curated content such as Storify changing how people communicate during disasters?

This is one area where the research hasn’t caught up with practice yet. However, I think that social media sites that curate content such as Storify, Pinterest, or even Instagram are going to be major players in disaster communication in the future. One of the reasons people don’t turn to social media for disaster information is that the quantity of information is difficult to sift through and verify. Sites that curate content help cut through the sea of online information, and also provide a familiar, reliable source of information through online connections established before disasters.

3. You talked about people mobilizing on social media after natural disasters in your report. Do you ever see people respond in real time?

Absolutely. Real-time communication is one of the primary draws of social media during disasters. There are multiple examples of social media being the first source of disaster information such as for the 2011 Tuscaloosa tornadoes and the 2008 Mumbai terrorist attacks.

4. What surprised you the most about how people were using social media during natural disasters?

By far the biggest surprise is that people still turn to traditional media sources, especially broadcast journalism, as the most accurate source of disaster information. So, while they may first turn to social media, they still prefer traditional media during disasters. I think this may change over time, but it certainly was a surprise for me. Of course, journalists often rely on social media for disaster information, and I think over time we’ll see the distinction between traditional media and so-called new media blur even more.

5. How do you think the use of social media in natural disasters will evolve?

I think over time people will view social media as more trustworthy and thus turn to it as their primary source of information. I also think social media will continue to play a large role in facilitating disaster recovery by helping people connect with each other and rebuild communities. “Official sources” such as governments and the media will increasingly enhance their social media presence before disasters, which likely will position them to be not only the first, but also most trustworthy social media sources down the road. Perhaps most importantly I think social media will continue to surprise us by providing new communication capabilities during disasters that we can’t currently predict.

Continue reading

Plugged In To Gnip: Shining A Light On Social Data

Plugged In To Gnip

Today we announced our Plugged In To Gnip partnership program. Although this is an important milestone for our company, we believe this program clearly marks the beginning of new era in the quickly maturing social data ecosystem and that the benefits of today’s announcement will be felt by the end users of social data analysis for years to come.

As we’ve often said, we believe social data has unlimited value and near limitless application. Countless companies, governments, and researchers are now making critical decisions based upon this data. Much of the emphasis across these applications to date has been on the analysis and insights layer. At the same time, there has been clear recognition that the analysis and insights derived from these various solutions are only as good as the underlying social data they are built upon.

In the early days of the ecosystem, the options for accessing reliable, sustainable, and comprehensive social data were very limited. Solution providers that were building upon a sound data layer didn’t want to reveal their proprietary data acquisition secrets. Providers that did not have sound solution, wanted to avoid the data discussion all together. As a result, companies were talking about their solutions without shining a light on the critical data layer portion of the equation.

If social data is going to reach its full potential, the underlying data must be reliable, sustainable, and complete. As an industry we must shine a light on social data so that the data layer is analyzed and scrutinized as much as the application itself. As the world’s largest provider of social data, Gnip has a unique view of the ecosystem and of the organizations that are committed to highest level of social data integrity. At its core, the Plugged In program is a way for us to collaborate with these advanced data organizations to keep driving the ecosystem forward.

There are lots of benefits to partners participating in the program including early access to new data and new features. But, the big winner is the end user. Plugged In To Gnip partners can confidently certify to their end users that they have complete and authorized access to the best social data in the world.

 

Data Stories: Annicka Campbell on the Digital Love Project

While I was perusing social data proposals for SXSW panels, I came across the Digital Love Project submission. “The Digital Love Project is a study of the full life cycle of romantic partnerships through ethnographic analysis of digital-social picture and video sharing. Through review of thousands of visual-social data points – from first glance to ultimately tying the knot – we observe trends about romantic partnerships that change the way we think about driving relevance.” I thought this would make a great Data Story, so I reached out to one of the researchers, Annicka Campbell of SapientNitro, for an interview. 

1. What was the genesis of deciding to do the Digital Love Project?

My co-author Melissa Read and I both have backgrounds in the social sciences – I was trained as an anthropologist, and Melissa has a PhD in digital marketing psychology. We’re both fascinated by the behavioral implications of social media, and so a few months ago we decided to embark on an ethnographic analysis of romantic partnerships as expressed through social. Our theory was that our findings would illuminate new ways for marketers to build more meaningful relationships and experiences for consumers online. It’s been a pretty fun (and often funny) experience, and we’re hoping that this combination of marketing insight and humor could translate really well for SXSW Interactive attendees.

Annicka Campbell of the Digital Love Project

Annicka Campbell of the Digital Love Project

2. How has displaying relationships on social media changed how people behave?

Great question. Since the term ‘relationship’ is so broad, we’ve focused on analyzing people in serious romantic relationships. We found that the relationship sharing process has become somewhat standardized; we identified a set of relationship milestones that couples are almost expected to share in social. This could mean posting joint vlogs or “selfies,” crowdsourcing engagement proposals, or having wedding attendees post with a customized hashtag. We’ve even seen instances of people updating their Facebook relationship statuses at the alter, which is obviously on the far end of the spectrum. Call them the digital rules of engagement, perhaps. We actually think that these behaviors are making romantic relationships more interactive and collaborative than ever before. Of course, there can be negative implications of this behavior, as well.

3. What social media publishers get the most love from those in love? What channels are people using to display their love?

This is such a great question. What we’ve found is that it often hinges on age – age of those in-love, and the age of the relationship. For teenagers, tools like Tumblr and Facebook provide a very public space to flirt and date in the passive presence of their friends, which I think is really important at that age. What really surprised us is that this is also true for more mature people.

Right now, we’re really focusing on the impact of geo-location in romantic relationships in our analysis. A few years ago I met an ex-boyfriend through Twitter’s “Nearby Tweets” feature. It was really convenient, initially, because he lived across the street from me! Of course, when we broke up, the geolocation API haunted me every time I left my apartment (at least for a few weeks).

Melissa Read  of the Digital Love Project

Melissa Read (Ph.D) of the Digital Love Project

4. By studying Facebook relationship changes, researchers were able to find out that breakups spiked highest around the holidays. What are some of the interesting takeaways you’ve learned about relationships by studying them online?

I loved the findings of David McCandless and Lee Byron’s project – and of course, how beautifully those findings were visualized. Some of what we’ve found is similar, particularly during wedding season, back-to-school season, and spring break.

We also found that social media is being used a feedback tool for people in relationships. This might mean instagramming a photo of a dinner cooked together as a way to express appreciation to them. Conversely, this could also mean using Facebook to give negative feedback to your significant other. I’m fascinated by the way some couples use social media tools like Tumblr as sort of a ‘relationship backchannel’ where they document and discuss the life they share together, with both their online and offlline friends. I think that our long national obsession with the negative effects of oversharing and narcissism online is passing. We increasingly view social media as tools to grow meaningful relationships, and perhaps even contribute something of value to our culture. Just look at Facebook introducing a same-sex marriage status option a few months ago.

5. What do you want people to take away from what you learned, especially marketers? How do marketers reach in-love couples besides targeting wedding ads at the newly engaged?

We’ve found that there are very tactical takeaways for marketers as it relates to engaged couples. As the economy continues to improve, we’ve seen that the wedding industry has really rebounded, as well – we spend $74 billion dollars per year on weddings, and that fact alone has obvious implications for marketers within that industry. We’re seeing event marketers using Pinterest to drive sales, planners using tools like Evernote to make the planning process more collaborative, and photographers incorporating Instagram into their toolkit.

Again, the tactical implications for that industry are interesting, but we’re trying to think even bigger. The idea that social media are being used as to establish, build and optimize romantic relationships in new and innovative ways – well, switch out “romantic relationships” with “consumer-brand relationships” and we have the beginning of a really interesting discussion

Footnote: Melissa and I would like to thank the members of the SapientNitro Marketing Strategy and Analysis Internship program for their significant contributions to this research program

Continue reading

Seth McGuire on Social Media and the Stock Market

Gnip’s Seth McGuire was on CNBC’s Squawk Box speaking to Andrew Ross Sorkin about social media and how investors can use data from social networks as part of their strategy. Gnip has been providing social data to the financial industry for more than a year, with clients including hedge funds, banks and signal/data providers.  Specifically, Seth spoke to how hedge funds and other traders are using social data as a variable in their algorithms, as well as a research product for deeper analysis of an equity.

Andrew and Seth also talked about how news frequently breaks on Twitter (the famous examples here is the death of Osama Bin Laden). This type of breaking news on StockTwits and Twitter provides a valuable signal that is frequently ahead of mainstream news. (As we’ve blogged before, natural disasters often are reported on Twitter before anywhere else.) Seth also talked about yesterday’s blog post from our data scientist, Scott Hendrickson, on JP Morgan’s $2 billion trading loss and how the news traveled through different social media publishers.

What Gnip has also seen is that while false stories might be shared on Twitter, Twitter is also quick to surpress the stories via crowdsourced response and questions as to the integrity of those false stories.

Squawk Box guest host Doug Dachille posed an interesting question on whether any of the financial regulators have reached out to use Gnip. While Gnip is serving government agencies in areas like disaster relief, right now it’s the actual compliance and data management departments at banks and funds who are more worried about social media. Most firms lock down the ability to post content on social networks, given SEC & FINRA restrictions, but when compliance officers walk the floor they see traders peeking at their iPhones or iPads to see breaking news and analysis on Twitter and StockTwits. From a compliance perspective, that’s dangerous…but they know the data is valuable so they’re seeking news ways (like Gnip) to bring that data in-house for controlled analysis.

Interested in learning more on social data and the stock market? Email info at gnip.com.

New Twitter Filtering Options

You asked and we delivered! Based on our customers’ feedback, we’ve introduced a number of new operators to the Twitter PowerTrack stream. With these new operators you can filter more precisely on geo data and the contents of a user’s Twitter profile. Check out the details below and let us know if you have any questions.

GEO OPERATORS
country_code
Many tweets with geo data are tagged with a “place” and these “places” are often associated with a country code indicating where that place is located. Using our new country_code operator, you can now filter all Tweets that have a specific country code. This can be done using Alpha-2 ISO codes to create a rule operator like: country_code:gb for all Tweets that have a “place” in Great Britain.

place_contains
As mentioned in the country_code description, many Tweets with geo data are tagged with a “place”. This place is a semi-normalized location determined by the Twitter app.  Examples might include “Boulder, CO” or “Jimmy’s Pizza”. Because this text is at best semi-normalized, we have created a place_contains operator as a compliment to our “place” operator, that performs a substring match. For example, using a place_contains:”Boulder” operator would match a tweet with a place of “Boulder” and a tweet with a place of “Boulder, CO”, whereas the place:”Boulder” operator would have only matched the former.

bio_location
In a user’s Twitter profile, they have the ability to specify a location. This field is completely freeform text and the locations are not normalized at all. In order to allow customers to get Tweets from user’s whose location is both Boulder and one whose location is “Boulder, CO”, we’ve introduced a bio_location operator that performs a tokenized keyword match on the contents of the field.

bio_location_contains
In the same vein as the bio_location operator, the bio_location_contains operator offers the ability to filter the Twitter stream based on the location users have specified in their Twitter bio. However, the bio_location_contains operator that performs a substring match on this field.

time_zone
Gnip’s new time_zone operator allows customers to filter the Twitter firehose for Tweets that have a time_zone that exactly matches that provided in the rule. The values delivered in the time zone field of the payload are normalized, based on Twitter’s description in account settings. Note that it is a string-based filter and needs to be an exact match.  

USER PROFILE OPERATORS
bio_lang
Within a Twitter user’s profile, they are required to select a language. This language setting simply changes the language which Twitter displays its UI text (it does not translate Tweet text).  THIS IS NOT A LANGUAGE CLASSIFICATION. Customers have reported that this setting is often left in its default of English even when the Tweets an account is generating are in a foreign language. We recommend its use in conjunction with Gnip’s language classification operator (lang) rather than a standalone indicator of a user or Tweet’s language.

followers_count
Like klout score, a user’s followers count can be used as a proxy for influence, and we have created a similar operator to allow our customers to filter on this data. Like the klout_score operator, the followers_count operator allows filtering the Twitter firehose to include Tweets from users with a follower count in a range or greater than a given value. WARNING: Use this operator with caution as it can easily result in the unexpected delivery of very high volumes of data.

bio_name_contains
The bio_name_contains operator allows customers to filter for only Tweets that contain a given substring in a Twitter user’s displayName entered in their profile.

has:media
The has:media operator allows for the filtering of all Tweets that contain a media URL in the Tweet body, be it an image, video, or otherwise. Has:media is true based on the inclusion of the media entity in the Tweet payload delivered by Twitter. No media detection or extraction is performed by Gnip in support of this operator.

You can find additional documentation on each of these new operators and the rest of our Twitter filtering options at http://support.gnip.com/customer/portal/articles/600544-twitter-powertrack-operators

As always, we love to hear feedback and suggestions from our customers, and much of our product roadmap and prioritization is driven by customer needs and requests.  Keep the feedback coming via your account reps and info@gnip.com.

Data Stories: Rumi Chunara on Identifying Epidemics With Social Data

Data Stories is Gnip’s opportunity to tell the cool stories about the data scientists, data journalists and other people working in social data. This week we’re interviewing Rumi Chunara, Instructor at Harvard Medical School and HealthMap and Big Boulder speaker, about her work using social data to identify epidemics. Rumi has a background in building biological sensors, and that has translated in an interest in using social media data and other informal sources of data to identify epidemics. In addition to her work with HealthMap, Rumi was part of a study showing how Twitter could help identify cholera outbreaks in Haiti. You can follow her on Twitter at @rumichunara.

Rumi Chunara of HealthMap on using social data to identify epidemics

1. You’re currently trying to use social data to identify health epidemics. How did you get started doing this, and what was your career path to get there?

During school I studied engineering, and my research involved building portable bio-sensors. One idea behind this was to making measurements typically done in laboratories possible outside of where that infrastructure is available. I thought the concept is really neat and useful, but realized that the types of tools and technologies we can build will always be changing and getting better. So I decided to start working on how we can bring together all of these novel information sources together, and how we can use them to improve health for populations.

2. What has been the success of HealthMap in identifying epidemics? What do you see as the future of systems like HealthMap?

It’s neat to see how HealthMap has become a pioneer in demonstrating the value of informally collected health information. This has meant demonstrating what types of information can be used and how it can be aggregated and analyzed. By demonstrating that informal sources have added value, whether that is in giving an earlier signal, allowing more detailed understanding of a disease outbreak, or reaching more people, in the future I think we will accept and look to other new sources for health, that in aggregate can become extraordinarily valuable.

3. What are some of the different methods you’ve used to help understand epidemics?

Our research group began using informal data such as Internet search queries, news media and data from mobile phones for monitoring and understanding disease outbreaks. Later, at HealthMap we have expanded to also using other sources such as data from social media. Beyond harnessing existing data and looking for health information, we are also building other systems to specifically ask people about their health, via the Internet. The beauty of this type of surveillance is that it works and has value wherever people can access the Internet, which can be in many parts of the world. As well, social media is suited for understanding the spread of disease because it helps identify where you are and whom you are connected to, and can reach a lot of people, which are all important in spread of disease!

4. Your research work has found that Twitter could have helped identify outbreaks of cholera in Haiti? What were the takeaways on using social data to identify an outbreak after a natural disaster?

We have shown, for a particular outbreak situation, how Twitter could be used to identify the outbreak early on; something that has also been shown before for other disease outbreak situations by other groups. Our work also went beyond this, to demonstrate that the Tweets could further be used to get a sense of how an outbreak is progressing. Some lessons from our study were: learning about reasons why social data can vary during an outbreak (for example, because of an emerging or ongoing public health event, due to media coverage or other environmental events happening at the same time). Also we learned that each situation will be different depending on the context around the event. The biggest lesson we hope comes out of our study is that there is a potential use for these novel types of data, which should be explored more.

5. What do you see as the future of social data in the health world?
Because of all of the benefits we have demonstrated from social data, my view is that it will be useful to harness it in complement to the other existing data sources we have such as traditional case and hospitalization information, not as a replacement for our medical and public health infrastructure. Informal social data can fill gaps in traditionally used sources of data in healthcare, and importantly it will hopefully empower individuals to become more involved with and proactive about their own health!

We appreciate Rumi taking the time to speak with us!  Let us know in the comments if you have a suggestion for another interviewee for Data Stories. 

Using Social Data to Identify Epidemics

Continue reading