Gnip Client Libraries from Our Customers

Our customers rock. When they develop code to start using Gnip, they often share their libraries with us so that they might be useful to future Gnip customers as well. Although Gnip doesn’t currently officially support any client libraries for access to our social media API, we do like to highlight and bring attention to some of our customers who choose to share their work.

In particular, here are a few Gnip client libraries that happy customers have developed and shared with us. We’ll be posting them in our Power Track documentation and you can also find them linked here:

Java
by Zauber
https://github.com/zaubersoftware/gnip4j

Python
by General Sentiment
https://github.com/vkris/gnip-python/blob/master/streamingClient.py

If you’ve developed a library for access to Gnip data and you’d like to share it with us at Gnip and other Gnip customers, then drop us a note at info@gnip.com. We’d love to hear from you.

Why You Should Join Gnip

Gnip’s business is growing heartily. As a result, we need to field current demand, refine our existing product offering, and expand into completely new areas in order to deliver the web’s data. From a business standpoint we need to grow our existing sales team in order to capture as much of our traditional market as possible, as fast as possible. We also need to leverage established footholds in new verticals, and turn those into businesses as big as, or hopefully bigger than, our current primary market. The sales and business-line expansion at Gnip is in full swing, and we need more people on the sales and business team to help us achieve our goals.

From a technical standpoint I don’t know where to begin. We have a large existing customer base that we need to keep informed, help optimize, and generally support; we’re hiring technical support engineers. Our existing system scales just fine, but software was meant to iterate, and we have learned a lot about handling large volumes of real-time data streams, across many protocols and formats, for ultimate delivery to large numbers of customers. We want to evolve the current system to even better leverage computing resources, and provide a more streamlined customer experience. We’ve also bit off a historical data set indexing challenge that is well… of true historical proportion. The historical beast needs feeding, and it needs big brains to feast on. We need folks who know Java very well, have search, indexing, and large data-set management backgrounds.

On the system administration side of things… if you like to twiddle IP tables, tune MTUs for broad geographic region high-bandwidth data flow optimization, handle high-volume/bandwidth streaming content, then we’d like to hear from you. We need even more sys admin firepower.

Gnip is a technical product, with a technical sale. Our growth has us looking to offload a lot of the Sales Engineering support that the dev team currently takes on. Subsequently we’re looking to hire a Sales Engineer as well.

Gnip has a thriving business. We have a dedicated, passionate, intelligent team that knows how to execute. We’re building hard technology that has become a critical piece of the social media ecosystem. Gnip is also located in downtown Boulder, CO.

http://gnip.com/careers

The Only Constant is Change

As a few people have mentioned online today, Gnip laid off seven team members today. It was a horrible thing to have to do and my very best wishes go out to each team member who was let go.  If you’re in Boulder and need a Java or PHP developer, an HR/office manager or an inside salesperson, send an email to eric@gnip.com and I’ll connect you with some truly awesome people.

I would like to address a few specific points for our partners, customers and friends:

  1. We believe as strongly as ever in providing data aggregation solutions for our customers.  If we didn’t, we would have returned to our investors the year of funding we have in the bank (now two years).
  2. We are still delivering the same data as yesterday. The existing platform is highly stable and will continue to churn out data as long as we want it to.
  3. The changes in personnel revolve around rebuilding the technology stack to allow for faster, more iterative releases. We’ve been hamstrung by a technology platform that was built under a very different set of assumptions more than a year ago. While exceptionally fast and stable, it is also a beast to extend.  The next rev will be far more flexible and able to accommodate the many smart feature requests we receive.

To Alex, Shane, Ingrid, JL, Jenna, Chris and Jen, it has been a honor working with you and I hope to have the privilege to do so again some day.

To our partners and customers, Gnip’s future is brighter than ever and we look forward to serving your social data needs for many years to come.

Sincerely,

Eric Marcoullier, CEO

Guest Post, Rick Boykin: Gnip C# .NET Convenience Library

Microsoft .NETNow that the new Gnip convenience libraries have been published for a few weeks on GitHub, I’m going to tell you a bit about the libraries that I’m currently responsible for, the .NET libraries.  So, let’s dive in, shall we… The latest versions of the .NET libraries are heavily based on the previous version of the Java libraries, with a bit of .NET style thrown in. What that means is that I used Microsoft’s Java Language Conversion Assistant as a starting point, mixed in some shell scripting like Bash, Sed and Perl to fix the comments, and some of the messy parts that did not translate very well. I then made it more C# like by removing Java Annotations, adding .NET attributes, taking advantage of .NET native XML Serializer, utilizing System.Net.HttpWebRequest for communications, etc. It actually went fairly quick.  The next task was to start the Unit testing deep dive.

I have to say, I really didn’t know anything about the Gnip model, how it worked, or what it really was, at first. It just looked like an interesting project and some good folks. Unit testing, however, is one place where you learn about the details of how each little piece of a system really works. And since hardly any of my tests passed out of the gate (and I was not really even convinced that I even had enough tests in place,) I decided it was best to go at it till I was convinced. The library components are easy enough. The code is really separated into two parts. The first component is the Data Model, or Resources, which directly map to the Gnip XML model and live in the Gnip.Client.Resource namespace. The second component is the Data Access Layer or GnipConnection. The GnipConnection, when configured, is responsible for passing data to, and receiving data from, the Gnip servers.  So there are really only two main pieces to this code. Pretty simple: Resources and GnipConnection. The other code is just convenience and utility code to help make things a little more orderly and to reduce the amount of code.

So yeah, the testing… I used NUnit so folks could utilize the tests with the free version of VisualStudio, or even the command line if you want. I included a Gnip.build Nant file so that you can compile, run the tests, and create a zipped distribution of the code. I’ve also included an nunit project file in the Gnip.ClientTest root (gnip.nunit) that you can open with the NUnit UI to get things going. To help configure the tests, there is an App.config file in the root of the test project that is used to set all the configuration parameters.

The tests, like the code, are divided onto the Resource objects tests and the GnipConnection tests (and a few utility tests). The premise of the Resource object tests is to first ensure that the Resource objects are cool. These are simple data objects with very little logic built in (which is not to say that testing them thoroughly is not the utmost important.) There is a unit test for each one of the data objects and they proceed by ensuring that the properties work properly, the DeepEquals methods work properly, and that the marshalling to and from XML works properly. The DeepEquals methods are used extensively by the tests, so it is essential that we can trust them. As such, they are fairly comprehensive. The marshalling and un-marshalling tests are less so. They do a decent job; they just do not exercise every permutation of the XML elements and attributes. I do feel that they are sufficient enough to convince me that things are okay.

The GnipConnection is responsible for creating, retrieving, updating and deleting Publishers and Filters, and retrieving and publishing Activities and Notifications. There is also a mechanism built into the GnipConnection to get the Time from the Gnip server and to use that Time value to calculate the time offset between the calling client machine and the Gnip server. Since the Gnip server publishes activities and notifications in 1 minute wide addressable ‘buckets’, it is nice to know what the time is on the Gnip server with some degree of accuracy. No attempt is made to adjust for network latency, but we get pretty close to predicting the real Gnip time. That’s it. That little bit is realized in 25 or so methods on the GnipConnection class. Some of those methods are just different signatures of methods that do the same thing only with a more convenient set of parameters. The GnipConnection tests try to exercise every API call with several permutations of data. They are not completely comprehensive. There are a lot of permutations. But, I believe they hit every major corner case.

In testing all this, one thing I wanted to do was to run my tests and have the de-serialization of the XML validate against the XML Schema file I got from the good folks at Gnip. If I could de-serialize and then serialize a sufficiently diverse set of XML streams, while validating that those streams adhere to the XML Schema, then that was another bit of ammo for trusting that this thing works in situations beyond the test harness. In the Gnip.Client.Uti namespace there is a helper class icalled XmlHelper that contains a singleton of itself. There is a property called ValidateXml that can be reached like this XmlHelper.Instance.ValidateXml. Setting that to true will cause the XML to be validated anytime it is de-serialized, either in the tests or from the server. It is set to true in the tests. But, it doesn’t work with the stock Xsd distributed by Gnip.That Xsd does not include an element definition for each element at the top level which is required when validating against a schema. I had to create one that did. It is semantically identical to the Gnip version; it just pulls things out to the top level. You can find the custom version in the Gnip.Client/Xsd folder. By default it is compiled into the Gnip.Client.dll.

One of the last things I did, which had nothing really to do with testing, is to create the IGnipConnection interface. Use it if you want. If you use some kind of Inversion of Control container like Unity, or like to code to interfaces, it should come in handy.
That’s all for now. Enjoy!

Rick is a Software Engineer and Technical Director at Mondo Robot in Boulder, Colorado. He has been designing and writing software professionally since 1989, and working with .NET for the last 4 years. He is a regular fixture at the Boulder .NET user’s group meetings and the is a member of Boulder Digital Arts.

We're Taking Part in the Boulder Job Fair — Would You Like a Free Trip to Check Out Boulder (and Gnip)?

Boulder has the highest per capita programmers in the country.  It also has the healthiest people on the planet (I can’t back this one up with stats, just anecdotal evidence of 60-year-old grandmothers zooming up and down the mountains).  Basically, Boulder is the land of the mathlete, and it’s awesome!

A ton of local startups are competing for the best developers in Boulder and we’ve come to a common conclusion — we need to expand the pool of applicants.  It’s time to give developers living in the Bay Area, Boston and Bentonville a taste of the Boulder lifestyle and simultaneously introduce them to some of the coolest companies Boulder has to offer.

Are you a badass developer?  Do you code PHP or Jave or C++ in you sleep?  Can you denormalize a database with your eyes closed or create elegant streams of CSS?  We’d like to meet you.  In fact, we’d like to fly you out all expense paid to Boulder for a couple of days to meet some awesome companies, including Gnip, to see if there’s a love connection.  You’ll fly out on day one and spend time checking out the town, spend day two meeting with 20 killer tech companies and then have a third day to follow up with companies you like the best and then fly home.  Not a bad way to spend the last week of October.

If you’d like to know more, check out the additional details at Boulder.Me and then click the button to apply.

We’re looking forward to meeting you in Boulder next month.  We think you’ll dig the town as much as we do, and the companies are pretty rad, too.

The WHAT of Gnip: Changing APIs from Pull to Push

A few months ago a handful of folks came together and took a practical look at the state of “web services” on the network today. As an industry we’ve enjoyed the explosion of web APIs over the past several years, but it’s been “every man for himself,” and we’ve been left with hundreds of web APIs being consumed in random ways (random protocols and formats). There have been a few cracks at standardizing some of this, but most have been left in spec form with, at best, fragmented implementations, and most have been too high level to provide anything more than good bedtime reading. We set out to build something; not write a story.

For a great overview of the situation Gnip is plunging into, checkout Nik Cubrilovic’s post on techcrunchIT; “The New Datastream Aggregators, FriendFeed and Standards.”.

Our first service is the culmination of lots of work by smart, pragmatic, people. From day one we’ve had excellent partners helping us along the way; from early integrations with our API, to discussing specifications and standards to follow (or not to follow; what you chose not to do is often more important than what you chose to do). While we aspire to solve all of the challenges in the data portability space, we’re a small team biting off small chunks along a path. We are going to need the support, feedback, and assistance of the broader data portability (formal & informal) community in order to succeed. Now that we’ve finally launched, we’ll be in “release early, release often” mode to ensure tight feedback loops around our products.

Enough; what did we build!?!

For those who want to cut to the chase, here’s our API doc.

We built a system that connects Data Consumers to Data Publishers in a low-latency, highly-scalable standards-based way. Data can be pushed or pulled into Gnip (via XMPP, Atom, RSS, REST) and it can be pushed or pulled out of Gnip (currently only via REST, but the rest to follow). This release of Gnip is focused on propagating user generated activity events from point A to point B. Activity XML provides a terse format for Data Publishers to distribute their user’s activities. Collections XML provides a simple way for Data Consumers to only receive information about the users they care about. This release is about “change notification,” and a subsequent release will include the actual data along with the event.

 

As a Consumer, whether your application model is event- or polling-based Gnip can get you near-realtime activity information about the users you care about. Our goal is a maximum 60 second latency for any activity that occurs on the network. While the time our service implementation takes to drive activities from end to end is measured in milliseconds, we need some room to breathe.

Data can come in to Gnip via many formats, but it is XSLT’d into a normalized Activity XML format which makes consuming activity events (e.g. “Joe dugg a news story at 10am”) from a wide array of Publishers a breeze. Along the way we started cringing at the verb/activity overlap between various Publishers; did Jane “tweet” or “post”, they’re kinda the same thing? After sitting down with Chris Messina, it became clear that everyone else was cringing too. A verb/activity normalization table has been started, and Gnip is going to distill the cornucopia of activities into a common, community derived, format in order to make consumption even easier.

Data Publishers now have a central clearinghouse to push data when events on their services occur. Gnip manages the relationship with Data Consumers, and figures out which protocols and formats they want to play with. It will take awhile for the system to reach equilibrium with Gnip, but once it does, API balance will be reached; Publishers will notify Gnip when things happen, and Gnip will fan-out those events to an arbitrary number of Consumers in real-time (no throttling, no rate limiting).

Gnip is centralized. After much consternation, we resolved to start out with a centralized model. Not necessarily because we think it’s the best path, but because it is the best path to get something started. Imagine the internet as a clustered application; decentralization is fundamental (DNS comes to mind). That said, we needed a starting point and now we have one. A conversation with Chris Saad highlighted some work Paul Jones (among others) had done around a standard mechanism for change notification discovery and subscription; getpingd. Getpingd describes a mechanism for distributed change notification. The Subscription side of getpingd feels like a no-brainer for Gnip to support, but I’m not sure how to consider the Discovery end of it. In some sense, I see Gnip (assuming getpingd’s discovery model is implemented) as a getpingd node in the graph. We have lots to consider in the federated/distributed model.

Gnip is a classic chicken-and-egg scenario, we need Publishers & Consumers to be interesting. If your service produces events that you want others on the network to consume, we’d love to see you as a Publisher in Gnip; pushing events into the system for wide consumption. If your service relies on events created by users on other applications, we’d love to see you as a Consumer in Gnip.

We’ve started out with convenience libraries for perl, php, java, python, and ruby. Rather than maintain these ourselves, we plan on publishing them to respective language community code sites/repositories.

That’s what we’ve built in a nutshell. I’ll soon blog about exactly how we’ve built it.