Posts Tagged ‘data producers’

Posted in APIs,Customers,Partners,Publishers by Eric No Comments

We continue to work on enriching the Gnip schema to provide second level meta-data on user generated activities .  Given we push 10s to 100s of of millions of activities around daily supporting more meta-data means a bit of work beyond just updating our schema.

Today we rolled out meta-data updates to the <actor> and <to> elements of the Gnip schema.   The updates today are new optional attributes that provide a place to map additional user information that is available on some social media services like Twitter and others.    Initially we will just add the new meta-data to activities where the information is available inline with activities and then in the near future we are adding more platform features to support the scenario where a second API call is required to add this meta-data to the activity.

Starting today the <actor> element has support for the numeric userID, friends, followers, and posts.   In addition, we are now mapping the fullname and username to individual attributes in order to better support services that allow end users to create custom screen names and change those names.   The <to> element was updated to provide a new attribute for numeric userID.

Overview of updates to <actor> and <to> elements of Gnip schema:

  • <actor> is the person who performed the activity on the service
    • <posts> is the number of updates made by the user
    • <followers> is the number of people following the user
    • <friends> is the number of people the user has friended
    • <fullname> is the descriptive name or screen name of the user
    • <username> is the username of the user on the service
    • <uid> the unique numeric ID for the user on the service
    • <metaURL> is the user profile link on the service
  • <to> is the person who the activity is in response
    • <uid> the unique numeric ID for the user on the service
    • <metaURL> is the user profile link on the servic


Posted in APIs,Customers,Industry,Long,Partners,Publishers,Release,Strategy,solutions by Eric No Comments

This post is meant to provide a reminder and additional guidance for Gnip platform users as we transition to the new Twitter Streaming API at the end of the week.   We have lots going and want to make sure companies and developers are keeping up with the moving parts.

  • Friday, June 19th:  Twitter is turning off the original XMPP firehose that we have used as the default “Twitter Data Publisher” in the Community Edition of the platform.
  • Starting on Friday, June 19th the new default “Twitter Data Publisher” in the Community Edition of the platform will be integrated to the new “spritzer” tier of the Twitter Streaming API.     Spritzer is a sample of the Twitter stream and not a “firehose”.   This is the default publicly available stream that Twitter is allowing Gnip to make available for anyone to integrate.
  • All Gnip users will be able to access full-data filters with the updated Twitter Data Publisher
  • If your company has an authorized Twitter account for the gardenhose, shadow or birddog tiers and do not want to build and maintain this integration contact us by email at info@gnip.com or shane@gnip.com to discuss how Gnip can provide a solution.

Helpful information about the new Twitter Streaming API:

PS:  The planned Facebook integration is coming along and we have our internal prototype completed.  Driving toward the beta and should have more details in the next week or two.

PSS: We would still appreciate any feedback people can provide on their Twitter data intgration needs – take the survey

Posted in APIs,Customers,Partners,Publishers,Release by Eric No Comments

Last week we informed the community of our plans to transition to the new Twitter Streaming API. (see the blog post)  This post is going to focus on providing some information on how Gnip Filters will be updated in order to support the new requirements of the Streaming API.

Here is a general summary of what Gnip users need to have in mind to prepare for the transition.

1) The Twitter Streaming API uses HTTP Basic Authentication to open up a connection.   The authentication requires the Twitter Username:Password combination, and the account access tier is set at the Twitter account level.

2) The default Gnip support provided to users will be to the “spritzer” and “follow” tiers as these are public and can be accessed by any valid Twitter account.

3) Developers and companies that have use cases which require higher levels of access (gardenhose, shadow, birddog) need to send an email directly to Twitter at api@twitter.com. The email should include basic information about your use case, the access level that is required (gardenhose, shadow, birddog), and the Twitter account to map the access.  Also,  Twitter has a new URL to request access for the gardenhose level.

Also, to provide a preview of what the new Gnip filters will provide we wanted to include some screen shots of what we are working on at this time.   (Also, you will notice the prototypes were built using an updated user experience we are working on for a future release)

Figure 1:  Gnip Filter Creation

This is the start page for creating a Gnip filter that will connect to the new Twitter Streaming API.   Users now will need to provide a valid Twitter account in order to support the HTTP Basic Authentication requirements of the API.

gnip_twitter_streaming_api_filter

Figure 2: Gnip Filters will support the multiple tiers of the Twitter Streaming API

Twitter has multiple tiers for the Streaming API which will be supported in this update to the Gnip filters.  In the developer web app or at the Gnip API it will be possible to select the Streaming API tier that the filter will access.

gnip_twitter_streaming_api_filter_2

Posted in APIs,Customers,Partners,Publishers by Eric No Comments

When we started Gnip last year Twitter was among the first group of companies that understood the data integration problems we were trying to solve for developers and companies.   Because Gnip and Twitter were able to work together it has been possible to access and integrate data from Twitter by using the Gnip platform since last July using Gnip Notifications, and since last September using Gnip Data Activities.

All of this data access was the result of Gnip working with the Twitter XMPP “firehose” API to provide Twitter data access for users of both the Gnip Community and Standard edition product offerings.   Recently Twitter announced a new Streaming API and began an alpha program to start making the new API available.  Gnip has been testing the new Streaming API and now we are planning to move from the current XMPP API to the new Streaming API in the middle of June.    This transition to the new Streaming API will mean some changes in the default behavior and ability to access Twitter data as described below

New Streaming API Transition Highlights

  1. Gnip will now be able to provide both Gnip Notifications and Gnip Data Activities to all users of the Gnip platform.   We had stopped providing access to Data Activities to new customers last November when Twitter began working on the new API, but now all users of the Gnip platform can use either Notifications or Data Activities based on what is appropriate for their application use case.
  2. There are no changes to the Gnip API or service endpoints of Gnip Publishers and Filters due to this transition.  This is changing the default Twitter API that we integrate to for data from Twitter (added about 2 hours after original post)
  3. The Twitter Streaming API is meant to accommodate a class of applications that require near-real-time access to Twitter public statuses and is provided with several tiers of streaming API methods.  See the Twitter documentation for more information.
  4. The default Streaming API tiers that Gnip will be making available are the new “spritzer” and “follow” stream methods.   These are the only tiers which are made available publicly without requiring an end user agreement directly with Twitter at this time.
  5. The “spritzer” stream method is not a “firehose” as the XMPP stream that Gnip previously used as our default.   The average messages per second is still being worked out by Twitter, but at this time “spritzer” runs in the ballpark of 10-20 messages per second and can vary depending on lots of variables being managed by Twitter.
  6. The “follow” stream method returns public statuses from a specified set of users, by ID.
  7. For more on “spritzer”, “follow”, and other methods see the Twitter Streaming API Documentation.
What About Companies and Developers With Use Cases Are Not Met With the Twitter “Spritzer” and “Follow” Streaming API methods
Gnip and Twitter realize that many use cases exist for how companies want to use Twitter data and that new applications are being built everyday.   Therefore we are exploring how companies that are authorized by Twitter for other Streaming API methods  would be able to use the Gnip platform as their integration platform of choice.

Twitter has several additional Streaming API methods available to approved parties that require a signed agreement to access.   To better understand which developers and companies using the Gnip platform could benefit from these other Streaming API options we would encourage Gnip platform users to take this short 12 question survey: Gnip: Twitter Data Publisher Survey (URL: http://www.surveymonkey.com/s.aspx?sm=dQEkfMN15NyzWpu9sUgzhw_3d_3d)

What About the Gnip Twitter-search Data Publisher?
The Gnip Twitter-search Data Publisher is not impacted by the transition to the new Twitter Streaming API since it is implemented using the new Gnip Polling Service and provides keyword-based data integration to the search.twitter APIs.

We will provide more information when we lock down the actual day for the transition shortly.    Please take the survey and as always please contact us directly at info@gnip.com or send me a direct email at shane@gnip.com

Posted in APIs,Customers,Publishers,Strategy,solutions by Eric No Comments

Obviously we have some understanding on the concepts of pushing and polling of data from service endpoints since we basically founded a company on the premise that the world needed a middleware push data service.    Over the last year we have had a lot of success with the push model, but we also learned that for many reasons we also need to work with services via a polling approach.   For this reason our latest v2.1 includes the Gnip Service Polling feature so that we can work with any service using push, poll or a mixed approach.

Now, the really great thing for users of the Gnip platform is that how Gnip collects data is mostly abstracted away.   Every end user developer or company has the option to tell Gnip where to push data that you have set up filters or have a subscription.   We also realize not everyone has an IT setup to handle push so we have always provided the option for HTTP GET support that lets people grab data from a Gnip generated URL for your filters.

One place where the way Gnip collects data can make a difference, at this time, for our users is the expected latency of data.  Latency here refers to the time between the activity happening (i.e. Bob posted a photo, Susie made a comment, etc) and the time it hits the Gnip platform to be delivered to our awaiting users.     Here are some basic expectation setting thoughts.

PUSH services: When we have push services the latency experience is usually under 60 seconds, but we know that this is not always the case sense sometimes the services can back-up during heavy usage and latency can spike to minutes or even hours.   Still, when the services that push to us are running normal it is reasonable to expect 60 second latency or better and this is consistent for both the Community and Standard Edition of the Gnip platform.

POLLED services:   When Gnip is using our polling service to collect data the latency can vary from service to service based on a few factors

a) How often we hit an endpoint (say 5 times per second)

b) How many rules we have to schedule for execution against the endpoint (say over 70 million on YouTube)

c) How often we execute a specific rule (i.e. every 10 minutes).     Right now with the Community edition of the Gnip platform we are setting rule execution by default at 10 minute intervals and people need to have this in mind with their expectation for data flow from any given publisher.

Expectations for POLLING in the Community Edition: So I am sure some people who just read the above stopped and said “Why 10 minutes?”  Well we chose to focus on “breadth of data ” as the initial use case for polling.   Also, the 10 minute interval is for the Community edition (aka: the free version).   We have the complete ability to turn the dial and use the smarts built into the polling service feature we can execute the right rules faster (i.e. every 60 seconds or faster for popular terms and every 10, 20, etc minutes or more for less popular ones).    The key issue here is that for very prolific posting people or very common keyword rules (i.e. “obama”, “http”, “google”) there can be more posts that exist in the 10 minute default time-frame then we can collect in a single poll from the service endpoint.

For now the default expectation for our Community edition platform users should be a 10 minute execution interval for all rules when using any data publisher that is polled, which is consistent with the experience during our v2.1 Beta.    If your project or company needs something a bit more snappy with the data publishers that are polled then contact us at info@gnip.com or contact me directly at shane@gnip.com as these use cases require the Standard Edition of the Gnip platform.

Current pushed services on the platform include:  WordPress, Identi.ca, Intense Debate, Twitter, Seesmic,  Digg, and Delicious

Current polled services on the platform include:   Clipmarks, Dailymotion, deviantART, diigo, Flickr, Flixster, Fotolog, Friendfeed, Gamespot, Hulu, iLike, Multiply, Photobucket, Plurk, reddit, SlideShare, Smugmug, StumbleUpon, Tumblr, Vimeo, Webshots, Xanga, and YouTube

Posted in APIs,Industry,Partners,Publishers,Release by Eric No Comments

We are pleased to announce an early access program for a new Gnip data publisher to access and integrate data from the Facebook Platform Open Streams API.

Companies will realize immediate benefits from choosing to use the Gnip Platform for integrating data from Facebook.

  • Choose the specific Facebook users from among those that have authorized your applications and then Gnip will immediately begin collecting the relevant data, normalize it and deliver it in real-time to your applications.
  • Simplify the integration and data retention requirements for integrating with the Facebook Platform to your applications by using Gnip Notifications and Gnip Data Streams to work with and store either event meta-data or full-data based on the appropriate use case as defined by the Facebook Platform terms of use (i.e. the 24 hour rule, etc)

Developers and companies can sign up right now to be notified when the early access program is launched by sending an email to info@gnip.com with the subject: Facebook.  Any company signing up for the early access program will be eligible for three free months subscription service to the Gnip data publisher for the Facebook Platform once it is generally released.   At this time the early access program is planned to be launched in the summer.

And to provide a small taste of the upcoming integration here are two examples of what common Newsfeed actions on Facebook will look like when accessed via the planned Gnip data publisher.

1) Status update Example (fbids in this example were changed from actual one in my stream item)

<activities publisher=”facebook”>
<activity>
<at>2009-05-16T14:07:25.000Z</at>
<action>post</action>
<activityID>http://www.facebook.com/profile.php?aid=6&id=12345&ref=at</activityID>
<actor metaURL=”http://www.facebook.com/people/Shane-Pearson/12345″>Shane Pearson</actor>
<destinationURL=http://www.facebook.com/profile.php?id=12345&amp;story_fbid=12345</destinationURL>
<payload>
<body>It must be spring as my weekly trip to Lowes/Home Depot is back on the schedule</body>
</payload>
</activity>

2) Upload photo example (the below Gnip data schema maps to a Facebook activity stream example)

<activities publisher=”facebook”>
<activity>
<at>2009-04-06T21:23:00-07:00</at>
<action>upload</action>
<activityID>http://www.facebook.com/album.php?aid=6&id=499225643&ref=at</activityID>
<actor metaURL=”http://www.facebook.com/people/Snapshot-Smith/499225643″>Snapshot Smith</actor>
<destinationURLhttp://www.facebook.com/people/Snapshot-Smith/499225643</destinationURL>\
<payload>
<title>Snapshot Smith uploaded a photo.</title>
<body><p><a href=”http://www.facebook.com/photo.php?pid=28&id=499225643&ref=at” caption=”A very  attractive   wall, indeed”/></a></p>
</body>
<mediaURL type=”thumbnail” > http://photos-e.ak.fbcdn.net/photos-ak-snc1/v2692/195/117/499225643/s499225643_28_6861716.jpg</mediaURL>
<mediaURL type=”content” > http://www.facebook.com/photo.php?pid=28&id=499225643&ref=at<</mediaURL>
</payload>
</activity>

Posted in APIs,Publishers,Release by Eric No Comments

We just finished making a major upgrade to the beta api.gnip.com environment.  First, thank you to everyone for their patience during our middle of the day upgrade.  We normally schedule upgrades off hours or do a rolling upgrade, but tonight the entire team of ten is going to the Star Trek premiere.   Anyway, we made the “management” decision to do the upgrade earlier in the day so we did not run into our company/family event tonight.

What’s new?   Up until now the data publishers in api.gnip.com have been doing lazy scheduling, which means they would pull data but it was actually easy with a small number of rules to miss data or not have a filter get a hit in our scheduling.  Yeah, that is a beta thing as the primary reason for having the beta out for so long was giving users a chance to get all their existing integrations moved to the new schema.   With Beta 3 we have made some major enhancements system that from what we see in our tests greatly improve the amount of data flow across all our data publishers using the new polling services.

From a timeline standpoint we want to let the new features soak a bit and then we will lock down a date to take the system to production as early as the end of May.     In the next few days we are running diagnostics and scaling out the system by putting it through the paces, so feel free to do the same or just work with our notification streams on any given publisher.

Live long and prosper.  (sorry, just could not resist the Star Trek line)

Posted in APIs,Publishers,Release by Eric No Comments

We continue to push out new publishers to the beta http://api.gnip.com environment as we work to finish up the release and get the final touches on lots of new features.

The new publishers this week include the following:

  • FriendFeed-search:  Supports the KEYWORD rule-type and works with the standard FriendFeed Search interface for tracking conversations
  • Hulu: Supports the ACTOR rule-type and works with the standard Hulu interface for tracking conversations
  • Hulu-search: Supports the KEYWORD rule-type and works with the standard Hulu Search interface
  • YouTube: Supports the ACTOR and TAG rule-types and works witih the standard YouTube interface and tracks “uploads”
  • YouTube-search: Supports the KEYWORD rule type and works witih the standard YouTube-search interface

Ok, now go grab some data from these or any of our other now 20+ data publishers in the system.   Or read up on the new features in http://www.gnip.com/docs

continue reading…

Posted in APIs,Customers,Long,Publishers,Strategy,solutions by Eric No Comments

This is one people have asked about a lot.   We just pushed out a new publisher today for Flickr.

The new Flickr Publisher supports the Gnip TAG rule-type and allows people to easily integrate data from the Flickr API using the Gnip platform.     In the near future we plan to add support for the Gnip ACTOR rule-type, so stay tuned.   In the mean time it is very easy to define the tags that match your interests.  Not sure what tags to use, just check out some of the most popular tags being used on Flickr.

Check it out on http://api.gnip.com and go use Gnip to integrate some data from Flickr!

Posted in APIs,Publishers by Eric No Comments

With our schema now finalized and in beta at http://demo.gnip.com and the crowd-sourcing application launched to help us prioritize our publisher integration schedule the team is now heads down building out more publishers on the Gnip platform.

Today we put nine ten new publishers into demo.gnip.com.   All of these are using the updated schema and provide support for notifications and activities with full-data.  Have fun integrating some data!

  1. Delicious
  2. Fotolog
  3. Plurk
  4. Reddit
  5. Slideshare (added after original blog post)
  6. Stumbleupon
  7. Tumblr
  8. Twitter-search
  9. Vimeo
  10. Webshots