Gnip Launches YouTube Comments API

One of the main advantages of YouTube is that content posted on the site often continues to see views long after the content is posted. With comments accumulating on YouTube long after the video is posted, it’s tough for social media managers to stay on top of all the comments. Popular brands often have hundreds of videos and monitoring comments can be tedious.

This is why Gnip added access to the YouTube Comments API as part of our Enterprise Data Collector. With the YouTube Comments API, Gnip customers can easily track the comments for all the videos they care about long after the video is posted.

To look at the type of content that populates YouTube comments, we wanted to do a little bit of fun research. Our data scientist, Scott Hendrickson, looked at the most popular videos of some of the most popular brands on YouTube — Nike, Sephora, Volkswagen, Barbie and Nerdist. One area we were interested in was the language that people use in YouTube comments, looking at both light-hearted words and words indicating sentiment, we calibrated the Lol index. Ultimately, we looked at how often the words Lol, good, bad, love, hate, omg, lmao, wtf showed up as a percentage of total YouTube comments for that video.

Despite having a reputation for negative words, we found that often the word love was used far more frequently than hate. Lol was the most frequently used term in sillier videos such as the Barbie Dreamhouse video, while Nike’s inspirational video’s comments were more likely to include the word love.

Ultimately, we’re excited to launch a product that makes it easier for brands to monitor their YouTube comments.

Sephora’s most popular video is “Sephora Presents How to use Violent Lips.”

Sephora Violent Lips Lol Index

________________________________

Nerdist’s most popular video is “Sexy Jedi Bubblebath! Saber 2: Return of the Body Wash”

Nerdist YouTube Lol Index

________________________________

 

Nike’s most popular video is Find Your Greatness.

Nike YouTube Comments Lol Index

________________________________

 Volkswagen’s most popular video is The Force.

Volkswagen YouTube Lol Index________________________________

Barbie’s™ most popular video is “Life in the Dreamhouse — Happy Birthday Chelsea”

Barbie YouTube Lol Index

________________________________

Delivering 30 Billion Social Media Activities Monthly . . . and Counting

I’m excited to announce that, as of the end of October, Gnip is delivering over 30 billion paid social media activities per month to our customers. This is the largest number of paid social media activities that have ever been distributed in a 30 day period.Over the past year, we’ve seen extraordinary growth in the number of paid social media activities we deliver. At the start of 2011, Gnip was delivering 300 million activities per month.  By May, that number was up to 3 billion activities per month.  And in October, we delivered 30 billion activities.  In essence, we’ve been growing by a factor of 10 every 5 months.  At this rate, we’ll be delivering 300 billion activities per month by March of next year

Cool numbers, but what’s driving this growth?

We’re seeing three key areas that are driving this number. First, we’re signing on new customers at an increasing rate, as more and more companies are seeing the possibilities in social media data. Second, we’re seeing increased interest in our Twitter firehose products. From hedge funds using social data to drive trading strategies to business intelligence companies layering social data onto their existing structured data sources, interest in volume products from Twitter is consistently increasing.  And finally, we’re seeing a marked increase in the number of customers using multiple sources to enrich their product capabilities.  From boards and forums to YouTube and Facebook, our customers are seeing the potential in the many other social data we offer.

So, 300 billion per month by March? It’s a big number, but the way things are going, I’ll take the over.

Customer Spotlight – MutualMind

 
Like many startups seeking to enter and capitalize on the rising social media marketplace, timing is everything. MutualMind was no exception: getting their enterprise social media management product to market in a timely manner was crucial to the success of their business. MutualMind provides an enterprise social media intelligence and management system that monitors, analyzes, and promotes brands on social networks and helps increase social media ROI. The platform enables customers to listen to discussion on the social web, gauge sentiment, track competitors, identify and engage with influencers, and use resulting insights to improve their overall brand strategy.

“Through their social media API, Gnip helped us push our product to market six months ahead of schedule, enabling us to capitalize on the social media intelligence space. This allowed MutualMind to focus on the core value it adds by providing advanced analytics, seamless engagement, and enterprise-grade social management capabilities.”

- Babar Bhatti
CEO, MutualMind

By selecting Gnip as their data delivery partner, MutualMind was able to get their product to market six months ahead of schedule. Today, MutualMind processes tens of millions of data activities per month using multiple sources from Gnip including premium Twitter data, YouTube, Flickr, and more.
 
Get the full detail, read the success story here.

Our Poem for Mountain.rb

Hello and Greetings, Our Ruby Dev Friends,
Mountain.rb we were pleased to attend.

Perhaps we did meet you! Perhaps we did not.
We hope, either way, you’ll give our tools a shot.

What do we do? Manage API feeds.
We fight the rate limits, dedupe all those tweets.

Need to know where those bit.ly’s point to?
Want to choose polling or streaming, do you?

We do those things, and on top of all that,
We put all your results in just one format.

You write only one parser for all of our feeds.
(We’ve got over 100 to meet your needs.)

The Facebook, The Twitter, The YouTube and More
If mass data collection makes your head sore…

Do not curse publishers, don’t make a fuss.
Just go to the Internet and visit us.

We’re not the best poets. Data’s more our thing.
So when you face APIs… give us a ring.

Social Media in Natural Disasters

Gnip is located in Boulder, CO, and we’re unfortunately experiencing a spate of serious wildfires as we wind Summer down. Social media has been a crucial source of information for the community here over the past week as we have collectively Tweeted, Flickred, YouTubed and Facebooked our experiences. Mashups depicting the fires and associated social media quickly started emerging after the fires started. VisionLink (a Gnip customer) produced the most useful aggregated map of official boundary & placemark data, coupled with social media delivered by Gnip (click the “Feeds” section along the left-side to toggle social media); screenshot below.

Visionlink Gnip Social Media Map

With Gnip, they started displaying geo-located Tweets, then added Flickr photos with the flip of a switch. No new messy integrations that required learning a new API with all of it’s rate limiting, formatting, and delivery protocol nuances. Simple selection of data sources they deemed relevant to informing a community reacting, real-time, to a disaster.

It was great to see a firm focus on their core value proposition (official disaster relief data), and quickly integrate relevant social media without all the fuss.

Our thoughts are with everyone who was impacted by the fires.

Response Code Nuances

While fixing a bug yesterday, I plowed through the code that does Gnip’s HTTP response code special case handling. The scenarios we’re handling illustrate the complexities around doing integrations with many web APIs. It was a reminder of how much we all want standards to work, and how often they only partially do so. Here are a few nuances you should consider if you’re doing API integrations by hand.

“retry-after”

When doing a polling based integration with a “real-time” API, you’re inclined to poll it a lot. That has caused some service providers to tell you to slow down using the “retry-after” HTTP header. Some providers use other, not so standard, ways to cool you down, but those are beyond the scope of this post. When you get a non-200-level response back from a server, you should consider looking for the retry-after header, regardless of whether or not it was a 503 or 300-level code (per HTTP 1.1 specification). Generally, when a services sends a retry-after, they’re intention behind it is clear, and you should respect the value that comes back. Now, the format of that value can be either “seconds”, or in a more verbose time format that tells you when you should wait “until” before trying the request again. In practice, we’ve never seen the latter; only the “seconds” version. When we see retry-after, we sleep that duration; you should probably do the same.

HTTP Response-code ’999′

You can look for it in the spec, but you won’t find it. Delicious likes to send a ’999′ back when you’re hitting them too hard. Consider backing off for several minutes if you see this from them.

non-200 HTTP Response Bodies

While many services don’t bother sending response bodies back for non-200s (and those that do often don’t provide anything actionable), many do. It’s a good idea to write those bodies to a log file (or at least the first n-hundred bytes) for human inspection. There can be some useful information in there to help you build a more effective and efficient integration.

The matrix of services-to-response codes, and how you should respond to them, is big. The above is just a small slice of the scenarios your integrations will encounter, and that you’ll need to solve for.

While a service’s documentation is always some degree out of date, and you can only truly learn the behavioral characteristics through long nights of debugging, here are some pointers to service specific response codes that you might find useful.

Gnip Platform Update – Now For Authenticated Data Services

The Gnip Platform originally was built to support accessing public services and data.  In response to customer requests we soft launched support for authenticated data services over the summer and now we have fully rolled out the new service.   The difference between public and authenticated data services seems trivial, but in practice the differences are very important since authenticated services represent either business level arrangements between companies or private data access.   The new Gnip capabilities supports both of these scenarios.

As part of the new service Gnip also provides dedicated integration capacity for companies as we now are able to segment individually managed nodes on our platform for specific company accounts.   This means that a company with a developer key on Flickr, a whitelist account on Twitter, an application key on Facebook and a developer key on YouTube receives dedicated capacity on the Gnip platform to support all their data integration requirements.

Gnip will also continue to maintain the existing public data integration services which do not require authentication for access and distribution, and we expect most companies with use a blend of our data integration services.

Using the new support for authenticated data service requires contacting us at sales@gnip.com so we can enable your account. Please contact us today to leverage your existing whitelisting or authenticated account on Flickr, YouTube, Twitter or other APIs and feeds.

Pushing and Polling Data Differences in Approach on the Gnip platform

Obviously we have some understanding on the concepts of pushing and polling of data from service endpoints since we basically founded a company on the premise that the world needed a middleware push data service.    Over the last year we have had a lot of success with the push model, but we also learned that for many reasons we also need to work with services via a polling approach.   For this reason our latest v2.1 includes the Gnip Service Polling feature so that we can work with any service using push, poll or a mixed approach.

Now, the really great thing for users of the Gnip platform is that how Gnip collects data is mostly abstracted away.   Every end user developer or company has the option to tell Gnip where to push data that you have set up filters or have a subscription.   We also realize not everyone has an IT setup to handle push so we have always provided the option for HTTP GET support that lets people grab data from a Gnip generated URL for your filters.

One place where the way Gnip collects data can make a difference, at this time, for our users is the expected latency of data.  Latency here refers to the time between the activity happening (i.e. Bob posted a photo, Susie made a comment, etc) and the time it hits the Gnip platform to be delivered to our awaiting users.     Here are some basic expectation setting thoughts.

PUSH services: When we have push services the latency experience is usually under 60 seconds, but we know that this is not always the case sense sometimes the services can back-up during heavy usage and latency can spike to minutes or even hours.   Still, when the services that push to us are running normal it is reasonable to expect 60 second latency or better and this is consistent for both the Community and Standard Edition of the Gnip platform.

POLLED services:   When Gnip is using our polling service to collect data the latency can vary from service to service based on a few factors

a) How often we hit an endpoint (say 5 times per second)

b) How many rules we have to schedule for execution against the endpoint (say over 70 million on YouTube)

c) How often we execute a specific rule (i.e. every 10 minutes).     Right now with the Community edition of the Gnip platform we are setting rule execution by default at 10 minute intervals and people need to have this in mind with their expectation for data flow from any given publisher.

Expectations for POLLING in the Community Edition: So I am sure some people who just read the above stopped and said “Why 10 minutes?”  Well we chose to focus on “breadth of data ” as the initial use case for polling.   Also, the 10 minute interval is for the Community edition (aka: the free version).   We have the complete ability to turn the dial and use the smarts built into the polling service feature we can execute the right rules faster (i.e. every 60 seconds or faster for popular terms and every 10, 20, etc minutes or more for less popular ones).    The key issue here is that for very prolific posting people or very common keyword rules (i.e. “obama”, “http”, “google”) there can be more posts that exist in the 10 minute default time-frame then we can collect in a single poll from the service endpoint.

For now the default expectation for our Community edition platform users should be a 10 minute execution interval for all rules when using any data publisher that is polled, which is consistent with the experience during our v2.1 Beta.    If your project or company needs something a bit more snappy with the data publishers that are polled then contact us at info@gnip.com or contact me directly at shane@gnip.com as these use cases require the Standard Edition of the Gnip platform.

Current pushed services on the platform include:  WordPress, Identi.ca, Intense Debate, Twitter, Seesmic,  Digg, and Delicious

Current polled services on the platform include:   Clipmarks, Dailymotion, deviantART, diigo, Flickr, Flixster, Fotolog, Friendfeed, Gamespot, Hulu, iLike, Multiply, Photobucket, Plurk, reddit, SlideShare, Smugmug, StumbleUpon, Tumblr, Vimeo, Webshots, Xanga, and YouTube

New Gnip Publishers: FriendFeed, YouTube and Hulu

We continue to push out new publishers to the beta http://api.gnip.com environment as we work to finish up the release and get the final touches on lots of new features.

The new publishers this week include the following:

  • FriendFeed-search:  Supports the KEYWORD rule-type and works with the standard FriendFeed Search interface for tracking conversations
  • Hulu: Supports the ACTOR rule-type and works with the standard Hulu interface for tracking conversations
  • Hulu-search: Supports the KEYWORD rule-type and works with the standard Hulu Search interface
  • YouTube: Supports the ACTOR and TAG rule-types and works witih the standard YouTube interface and tracks “uploads”
  • YouTube-search: Supports the KEYWORD rule type and works witih the standard YouTube-search interface

Ok, now go grab some data from these or any of our other now 20+ data publishers in the system.   Or read up on the new features in http://www.gnip.com/docs

Continue reading

Gnip Pushed a New Platform Release This Week

We just pushed out a new release this week that includes new publishers and capabilities. Here is a summary of the release highlights. Enjoy!

  • New YouTube publisher: Do you need an easy way to access, filter and integrate YouTube content to your web application or website? Gnip now provides a YouTube publisher so go create some new filters and start integrating YouTube based content.
  • New Flickr publisher: Our first Flickr publisher had some issues with data consistency and could almost be described as broken. We built a brand new Flickr publisher to provide better access to content from Flickr. Creating filters is a snap so go grab some Flickr content.
  • Now publisher information can be shared across accounts: When multiple developers are using Gnip to integrate web APIs and feeds it sometimes is useful to see other filters as examples. Sharing allows a user to see publisher activity and statistics, but does grant the ability to edit or delete.
  • New Data Producer Analytics Dashboard: If your company is pushing content through Gnip we understand it is important to see how, where and who is accessing the content using our platform and with this release we have added a web-based data producer analytics dashboard. This is a beta feature, not where we want it yet, and we have some incomplete data issues. However, we wanted to get something available and then iterate based on feedback. If you are a data producer let us know how to take this forward. The current version provides access to the complete list of filters created against a publisher and the information can be downloaded in XML or CSV format

Also, we have a few things we are working on for upcoming releases:

  • Gnip Polling: Our new Flickr and YouTube publishers both leverage our new Gnip Polling service, which we have started using internally for access to content that is not available via our push infrastructure. We plan to make this feature available externally to customers in the future, so stay tuned or contact us if you want to learn more.
  • User generated publishers from RSS Feeds: We are going to open up the system so anyone can create new publishers from RSS Feeds. This new feature makes it easy to access, filter and integrate tons of web based content.
  • Field level mapping on RSS feeds: A lot of times the field naming of RSS feeds across different endpoints does not map to the way the field is named in your company. This new feature will allow the editing and mapping at the individual field level to support normalization across multiple feeds.
  • Filter rule batch updates: When your filters start to get big adding lots of new rules can be a challenge. Based on direct customer feedback it will soon be possible to batch upload filter rules.