Big Boulder 2013

Big Boulder’s back for 2013 and better than ever.

The leaders in social data: Facebook, Twitter, Tumblr, Foursquare, Automattic, Disqus and many more are descending on Boulder again this summer to talk about the future of their platforms. Last year was a huge success and the expectations this year are even higher. We have a line-up that will deliver!

Headshots for Big Boulder

We’ll go deep into Asia and Latin America with speakers from China, Brazil and Japan, including the CEO of LINE, one of the fastest growing social networks on the planet. We’ll hear about non-traditional applications of Social Data with discussions on Finance, Government, Academic Research and Data Science. And to help us make sense of it all, we’ll have industry analysts discussing their views of the future. See the agenda and speakers pages for all the details.

In addition to all the great topics covered in the sessions, we’ve left plenty of time for networking with others in Social Data, including sunset cocktails with views of the Flatirons, a bicycle pub crawl, and since this is Boulder after all, morning yoga and hiking.

Big Boulder is an invite-only event for the leaders in the social data ecosystem. Space is filling up quickly so if you’re still thinking about it, sign up now before we hit capacity. Interested in coming but haven’t been invited? First check out our blog post about social data vs. social media. If you’re all about social data, email bre@gnip.com for information.

4 Things You Need To Know About Migrating to Version 1.1 of the Twitter API

Access to Twitter data through their API has been evolving since its inception. Last September, Twitter announced their most recent changes which will take effect this coming March 5. These changes make enhancements to feed delivery, while further limiting the amount of Tweets you can get from the public Twitter API.

The old API was version 1.0 and the new one is version 1.1. If your business or app relies on Twitter’s public API, you may be asking yourself “What’s new in Twitter API 1.1?” or “What changed in Twitter API 1.1?” While there’s not much new, a lot has changed and there are several steps you need to take to ensure that you’re still able to access Twitter data after March 5th.

1. OAuth Connection Required
In Twitter API 1.1, access to the API requires authentication using OAuth. To get your Twitter OAuth token, you’ll need to fill out this form.  Note that rate limits will be applied on a per-endpoint, per-OAuth token basis and distributing your requests among multiple IP addresses will not work anymore as a workaround. Requests to the API without OAuth authorization will not return data and will receive a HTTP 410 Gone response.

2. 80% Less Data
In version 1.0, the rate limit on the Twitter Search API was 1 request per second. In Twitter API 1.1, that changes to 1 request per every 5 seconds. A more stark way to put this is that previously you could make 3600 requests/hour but you are now limited to 720 requests/hour for Twitter data. Combined with the existing limits to the number of results returned per request, it will be much more difficult to consume the volume or levels of data coverage you could previously through the Twitter API. If the new rate limit is an issue, you can get full coverage commercial grade Twitter access through Gnip which isn’t subject to rate limits.

3. New Endpoint URLs
Twitter API 1.1 also has new endpoint URLs that you will need to direct your application to in order to access the data. If you try to access the old endpoints, you won’t receive any data and will receive a HTTP 410 Gone response.

4. Hello JSON. Goodbye XML.
Twitter has changed the format in which the data is delivered. In version 1.0 of the Twitter API, data was delivered in XML format. Twitter API 1.1 delivers data in JSON format only. Twitter has been slowly transitioning away from XML starting with the Streaming API and Trend API.  Going forward, all APIs will be using JSON and not XML. The Twitter JSON API is a great step forward as JSON has a much wider standardization than XML does.

All in all, some pretty impactful changes.  If you’re looking for more information, we’ve provided some links below with more details.  If you’re interested in getting full coverage commercial grade access to Twitter data where rate limits are a thing of the past, check out the details of Gnip’s Twitter offerings.  We have a variety of Twitter products, including realtime coverage and volume streams, as well as access to the entire archive of historical Tweets.

Update: Twitter has recently announced that the Twitter REST API v1.0 will officially retire on May 7, 2013. Between now and then they will continue to run blackout tests and those who have not migrated will see interrupted coverage so migrating as soon as possible is highly encouraged.

Helpful Links
Version 1.0 Retirement Post
Version 1.0 Retirement Final Dates
Changes coming in Twitter API 1.1
OAuth Application Form
REST API Version 1.1 Resources
Twitter API 1.1 FAQ
Twitter API 1.1 Discussion
Twitter Error Code Responses

Aspirational Brands & Tumblr: Lexus vs. Toyota

Gnip conducted a brief analysis of the Toyota family of brands (Toyota, 4Runner, Camry, Highlander, Lexus, Prius, Rav4, Scion, Sequoia, Tacoma, Tundra) on multiple social media platforms. We looked at brand mentions on Tumblr, Twitter, WordPress and WordPress comments during the period of Oct. 15 to Nov. 15, 2012.

As you would expect, Toyota was the most frequently mentioned brand on each social platform, with one enormous exception – Tumblr. Lexus had 5 times as many mentions on Tumblr as Toyota. This highlights how aspirational brands do exceptionally well on Tumblr where niche communities of fans often form around brands. (Attention brand managers, this happens whether the company is involved or not). A central component of Tumblr is visual content, which also plays well with aspirational brands. Furthermore, Tumblr content is both extremely viral and has a long shelf life meaning that content shared on Tumblr can be shared for longer periods of time and jump to more diverse sub-groups within the network than other social networks. During the month Gnip tracked mentions, Lexus received more than 200,000 mentions while Toyota received 40,000.

In social media, it is easy to rely on Twitter as a kind of alert system of when content is being shared, but at Gnip we’ve seen time and time again where content that pops up elsewhere doesn’t always pop up on Twitter. Each social media network has its own attributes and audience and modes of interaction. Because of likes, reblogging, and the way timelines are read by Tumblr users, Tumblr has active communities that aren’t found elsewhere.

Lexus on Tumblr

Gnip Announces Partnership with Leader in Japanese Social Media Analytics

With more than 10% of the Twitter firehose in Japanese, the Japanese market for social data is a huge opportunity. This is why Gnip is excited to announce that we’re partnering with Hottolink as part of a strategic alliance to better serve Twitter data in Japan.

Through the alliance, Hottolink will have access to Gnip’s suite of products that serve data from Twitter’s full firehose. This data will power Hottolink’s social media listening platform with ongoing and historical access to Tweets in Japanese and every other language. By partnering with Hottolink, Gnip will have access to Hottolink’s technology and expertise, enabling Gnip to better meet the needs of the Japanese market.

Japan is the third-largest Tweeting population in the world with more than 30 million accounts and has some of the most active users in the world.  In fact, the world record for tweets-per-second was set in December 2011 during the television broadcast of the Japanese anime movie “Castle in the Sky,” with 25,088 tweets.

In Japan, they call a Tweet a “mumble” but the signal from Japanese language Tweets is loud and clear.  If you’re interested in learning more, please check out the press release (also in Japanese!) or email info@gnip.com.

SGI Launches Global Twitter Heartbeat, Powered by Gnip

File this under cool news.

SGI’s Big Brain Computer has created a Global Twitter Heartbeat, allowing the supercomputer to analyze the Twitter stream for sentiment and geolocation to create a Twitter heartbeat telling us how the world is feeling based on emotions communicated via Twitter. Not only is this a cool undertaking by the folks at SGI, but we’re proud to announce that it is powered by Gnip’s decahose Twitter stream.

To make this happen, SGI partnered with Kalev H. Leetaru of the University of Illinois and Dr. Shaowen Wang of the CyberInfrastructure and Geospatial Information (CIGI) Laboratory at the University of Illinois at Urbana-Champaign.

This isn’t just some simple stream.  The SGI supercomputer analyzes every Tweet to assign location (not just GPS-tagged tweets, but processing the text of the Tweet itself) and tone values, then visualizing the conversation in a heat map that puts Tweet location, Tweet density and tone into a unified geospatial perspective. The entire process from ingestion to data analysis to producing the heat map runs at a speed that allows visualization of a map frame per second.

To see it live, check out SGI’s Facebook page.

You can also see videos of the Twitter Heartbeat for the Presidential Elections and Hurricane Sandy.

A Moment in History: Access the Full Archive of Public Tweets

We are proud to announce that, for the first time, access to the entire historical archive of public Tweets, dating back to @Jack’s very first Tweet more than 7 years ago, is now available via our new product, Historical PowerTrack for Twitter. This product has been years in the making, and we can’t wait to see what the world will build with this data.

 

We believe that social data has unlimited value and near limitless application. The nature (fast & viral) and newness of social conversations has naturally directed focus to realtime applications. However, as the world becomes more reliant on realtime social data and the amount of social data created grows exponentially, the need to put this information into historical context has become increasingly important. Often, companies are considering the realtime reaction in social data and asking “is this good or bad?” This is one of the main questions historical data can answer. For example, if an auto manufacturer launches a new model and 25% of the social conversation is determined to be negative, is that healthy?  Knowing that the last model launched to record sales & had 40% negativity helps put the new realtime data into context.

Historical data can also be highly informative to predictions about the future. Researchers have suggested to us that they can predict the outcome of a revolution by studying past revolutions online such as the “Arab Spring”.  Likewise, we’re seeing hedge funds make a real commitment to incorporating social data into their trading algorithms. It is critical for these funds to be able to refine their predictive trading models by studying vast quantities of historical data.

Until now, all this promise of social data has had a foundational limitation: very little reliable and complete historical data has been available. And as we know, historical analysis is only as good as the quality of the underlying data. You can’t provide complete context if you only have part of the data.  That’s why we are so excited to be the first company to offer complete coverage of all public Tweets from the beginning of time.

We’re able to deliver the full historical corpus via our long-standing partnership with Twitter. We helped Twitter deliver the full archive of Tweets to the Library of Congress. That was a massive effort that took a long time. The rest of the social data ecosystem can benefit from that effort starting today.

This level of access has never been available and we know it is really going to accelerate the rate of innovation going forward. We think there are new products and businesses that will now be possible with access to a “social layer” of historical data. We frequently ask ourselves “If you could know what the world was saying at any moment in time about any topic, what could you build?”

We’ve already been working with companies like Esri, Union Metrics, Brandwatch, Waggener Edstrom Worldwide, and Texifter during our early access period and it’s been incredible to see how fast they are innovating with this new data.

Gnip aspires to be the source of record for all public conversation. That’s a lofty goal. We’re taking a major step forward with today’s announcement.

Want to learn more about Historical PowerTrack for Twitter?  Email info@gnip.com.

For the Times When Every Tweet is Too Many

Our customers tell us that getting every single Tweet that matters is one of the key reasons they work with Gnip. And sometimes getting every Tweet that matters means filtering out the Tweets you don’t want. With this in mind, I’m happy to announce the introduction of two new operators to our Power Track filtering suite.

Retweet Operator

The Retweet operator allows a customer to ensure only Retweets that match a rule are delivered or excluded.

To use the Retweet operator, simply add is:retweet or –is:retweet to any rule.

Examples Include:

  • Receive only Retweets mentioning Apple using a rule like: apple is:retweet as a way to measure engagement of the brand’s fan base

or

  • Get only Tweets with unique content about Apple using a rule like: apple -is:retweet to monitor conversation about the brand and ignore the tremendous volume of retweets generated by the brand

Sampling Operator

The Sampling operator allows a customer to receive a random sample of Tweets that match a rule rather than the entire set of Tweets.

There are several use cases where the Sampling operator is useful.  Say you want to stay within a budgeted number of Tweets each month, but you’re trending higher than that budget halfway through the month.  With the Sampling operator, you can scale back your consumption without fully eliminating rules.  In another use case you might want to monitor a very high-volume rule or user, but your internal systems can’t handle this volume.  Sampling makes this more manageable.  Finally, there are times when you simply need to know the directional volumes for things, and don’t need every tweet.

To use the Sampling operator, add sample:## to any rule with an integer value between 1 to 100. The Sampling operator applies to the entire rule and requires any “OR’d” terms be grouped.

Examples Include:

  • Receive a sampling of 10% of all Tweets that contain “apple” using a rule like:

apple sample:10

or

  • Receive a sampling of 50% of all Tweets that contain “iPad” or “iPhone” using a rule like:

(ipad OR iphone) sample:50

As always, thank you for the product feedback and keep it coming.  Additional documentation of these new operators and others can be found in our online documentation.

 

Using Social Data to Predict Financial Markets

Seth McGuire, Gnip’s director of assets and financial technology, was on both Fox Business’ After the Bell and Bloomberg’s Money Moves to discuss how social data is being used by hedge funds as another layer of data to analyze the markets. Gnip is currently working with about a dozen hedge funds that have over $1 Billion AUM

Watch Seth on After the Bell

Watch the latest video at video.foxbusiness.com

Watch Seth on Money Moves

Enhanced Filtering for Power Track

Gnip is always looking for ways to improve its filtering capabilities and customer feedback plays a huge role in these efforts.  We are excited to announce enhancements to our PowerTrack product that allow for more precise filtering of the Twitter Firehose, a feature enhancement request that came directly from you, our customers.

Gnip PowerTrack rules now support OR and Grouping using ().  We have also loosened limitations on the number of characters and the number of clauses per rule. Specifically, a single rule can now include up to 10 positive clauses and up to 50 negative clauses (previously 10 total clauses).  Additionally, the character limit per rule has grown from 255 characters to 1024.

With these changes, we are now able to offer our customers a much more robust and precise filtering language to ensure you receive the Tweets that matter most to you and your business.  However, these improvements bring their own set of specific constraints that are important to be aware of.  Examples and details on these limitations are as follows:

OR and Grouping Examples

  • apple OR microsoft
  • apple (iphone OR ipad)
  • apple computer –(fruit OR green)
  • (apple OR mac) (computer OR monitor) new –fruit
  • (apple OR android) (ipad OR tablet) –(fruit green microsoft)

Character Limitations

  • A single rule may contain up to 1024 characters including operators and spaces.

Limitations

  • A single rule must contain at least 1 positive clause
  • A single rule supports a max of 10 positive clauses throughout the rule
  • A single rule supports max of 50 negative clauses throughout the rule
  • Negated ORs are not allowed. The following are examples of invalid rules:
  • -iphone OR ipad
  • ipad OR -(iphone OR ipod)

Precedence

  • An implied “AND” takes precedence in rule evaluation over an OR

For example a rule of:

  • android OR iphone ipad  would be evaluated as apple OR (iphone ipad)
  • ipad iphone OR android would be evaluated as (iphone ipad) OR android

You can find full details of the Gnip Power Track filtering changes in our online documentation.

Know of another way we can improve our filtering to meet your needs?  Let us know in the comments below.