Google+ Now Available from Gnip

Gnip is excited to announce the addition of Google+ to its repertoire of social data sources. Built on top of the Google+ Search API, Gnip’s stream allows its customers to consume realtime social media data from Google’s fast-growing social networking service. Using Gnip’s stream, customers can poll Google+ for public posts and comments matching the terms and phrases relevant to their business and client needs.

Google+ is an emerging player in the social networking space that is a great pairing with the Twitter, Facebook, and other microblog content currently offered by Gnip. If you are looking for volume, Google+ quickly became the third largest social networking platform within a week of its public launch and some are projecting it to emerge as the world’s second largest social network within the next twelve months. Looking to consume content from social network influencers? Google+ is where they are! (even former Facebook President Sean Parker says so).

By working with Gnip along with a stream of Google+ data (and the availability of an abundance of other social data sources), you’ll have access to a normalized data format, unwound URLs, and data deduplication. Existing Gnip customers can seamlessly add Google+ to their Gnip Data Collectors (all you need is a Google API Key). New to Gnip? Let us help you design the right solution for your social data needs, contact sales@gnip.com.

Links & The Twitter Firehose

One of the more interesting components of Twitter streams are the links within the Tweets themselves. Not only are links one way to bridge from traditional web trend analysis, to social media, but they are also a window into what people are sharing.

Gnip provides three mechanisms to get at links in Tweets.

  • Link Stream. The link stream provides you with 100% of the Tweets that contain links. Furthermore, Gnip enriches the stream with unwound URLs, so you don’t have to bother with an unwind-farm on your end.
  • Power Track’s ‘has:links’ operator. Through Power Track, you can refine your complex queries (including substring matching) to collect only Tweets that contain links.
  • Power Track’s ‘url_contains:’ operator. The ‘url_contains:’ operator allows you to filter the 100% Firehose for Tweets that have links and contain the substring you provide. It filters against both short, and long, URLs.

Happy filtering!

Announcing Power Track – Full Firehose filtering for the Tweets you want

The response to the commercial Twitter streams we’ve made available has been outstanding. We’ve talked to hundreds of companies who are building growing businesses that analyze conversations on Twitter and other social media sites. As Twitter’s firehose continues to grow (now over 110 million Tweets per day), we’re hearing more and more requests for a way to filter the firehose down to the Tweets that matter.

Today, we’re announcing a new commercial Twitter product called Power Track. This is a keyword based filter of the full firehose that provides 100% coverage over a stream that you define. Power Track customers no longer have to deal with polling rate limits on the Search API and volume limits on the Streaming API.

In addition to keyword based filters, Power Track also supports boolean operators and many of the custom operators allowed on Twitter Search API. With Power Track, companies and developers can define the precise slice of the Twitter stream they need and be confident they’re getting every Tweet, without worrying about volume restrictions.

Currently we support operators for narrowing the stream to a set of users, matching against unwound URLs, filtering by location, and more. We’ll continue to add support for more ways for our customers to filter the content relevant to them in the future. Check the documentation to see the technical details of these operators and more.

Gnip is here to ensure the enterprise marketplace gets the depth, breadth, and reliability of social media data it requires. Please contact us at info@gnip.com to find out more.

Gnip to power 301works.org

Every once in awhile there are opportunities to make a real difference in the industry.   301works.org is just such an opportunity for Gnip and the companies we are teaming up with to launch a very needed independent URL mapping directory service.

First, many thanks to Adjix, awe.sm, betaworks, bit.ly, Cligs,  URLizer, and urlShort who have joined with us to launch this new organization.  And. you can read the actual 301works announcement that is posted on Gnip.com

Why is Gnip involved?  We are part of the Internet software community and most of us are also active social media users.   While there is debate on the pros and cons of short URLs in some parts of the industry it is obvious that over the last few years there has been a huge growth in the adoption of short URL formats across the web and increasingly custom short URLs are being used by businesses and individuals.

People generate short URLs everyday and they need to know that these mappings will continue to function as they were intended when generated, that their mappings will be available for them to use in the future, and that their privacy preferences will be respected. With short URL formats having reached general acceptance by millions of users in their daily activities there was a need for the industry to ensure the connections provided by these mappings exist over time.

In providing the technology to power the 301works solution Gnip is ensuring that the social connections and data represented by the millions of URL mappings done everyday continue to be available across the web.

We are thrilled to be able to participate and to do our part in helping sustain and grow an open web.

Controlling Data Through URL Shorteners

I’m going to sidestep the “URL shorteners are bad because they obfuscate” discussion in this post. If you’re reading this, you likely have an opinion one way or another on that topic, but let’s leave that at the door. A bigger challenge is emerging as URL shortening continues to proliferate.

Web browsers unwinding a shortened URL when a user clicks on one is one thing, but when system software tries to unwind/resolve shortened URLs en masse, a problem emerges. The database that binds the short URL to its long version is hidden behind an API that can’t handle, or won’t allow (and I’m pointing at all of you URL shorteners out there), bulk unwinding of shortened URLs. The result is a bottleneck (the URL shortening services) that prevents “real-time” indexing of otherwise publicly available content. “Classic” offline crawl based search engines (e.g. Google, Y!, etc) will likely unwind in a latent “offline” manner, based on relevance. However, real-time search facilities are faced with trying to unwind large numbers of shortened URLs on the fly, and there doesn’t appear to be a way to accomplish this as the volume/rate of shortened URLs ever increases in daily social activity.

If your business relies on unwinding large volumes of shortened URLs in real-time, you’re faced with the usual optimization suspects: caching & relevance/prioritization based resolution. These will improve your ability to “keep up”, but they are a function of cache/hit ratios (which are generally poor in the social space when it comes to URL unwinding) and your own ability to decide what to unwind in an ever increasing volume of shortened URLs.

The result is another case of data control. If URL shortener & vanity host/URL adoption continues, and all URLs turn into redirects, we’ve become completely dependent on services that appear to be unwilling to open up their databases. I would appreciate part of this emerging standard including the ability to unwind in bulk.