Controlling Data Through URL Shorteners

June 30th, 2009
  • Posted by Jud Valeski, Co-Founder and CEO
No Comments

I’m going to sidestep the “URL shorteners are bad because they obfuscate” discussion in this post. If you’re reading this, you likely have an opinion one way or another on that topic, but let’s leave that at the door. A bigger challenge is emerging as URL shortening continues to proliferate.

Web browsers unwinding a shortened URL when a user clicks on one is one thing, but when system software tries to unwind/resolve shortened URLs en masse, a problem emerges. The database that binds the short URL to its long version is hidden behind an API that can’t handle, or won’t allow (and I’m pointing at all of you URL shorteners out there), bulk unwinding of shortened URLs. The result is a bottleneck (the URL shortening services) that prevents “real-time” indexing of otherwise publicly available content. “Classic” offline crawl based search engines (e.g. Google, Y!, etc) will likely unwind in a latent “offline” manner, based on relevance. However, real-time search facilities are faced with trying to unwind large numbers of shortened URLs on the fly, and there doesn’t appear to be a way to accomplish this as the volume/rate of shortened URLs ever increases in daily social activity.

If your business relies on unwinding large volumes of shortened URLs in real-time, you’re faced with the usual optimization suspects: caching & relevance/prioritization based resolution. These will improve your ability to “keep up”, but they are a function of cache/hit ratios (which are generally poor in the social space when it comes to URL unwinding) and your own ability to decide what to unwind in an ever increasing volume of shortened URLs.

The result is another case of data control. If URL shortener & vanity host/URL adoption continues, and all URLs turn into redirects, we’ve become completely dependent on services that appear to be unwilling to open up their databases. I would appreciate part of this emerging standard including the ability to unwind in bulk.

Comments are closed.

Follow Gnip

Archive

Recent Posts
Categories
Tags
Blogroll

Recent Tweets

  • # {New Product Feature} Enhanced Filtering for PowerTrack http://t.co/zVgJUY6H More precise filtering options for the Twitter firehose!
  • # Feasting on whale carcasses http://t.co/espZtpNL Twitter and Facebook, Why Twitter Might Be Worth More In The Long Run @pointsnfigures
  • # You learn something new every day http://t.co/oWsf08om - 8 Crazy Things IBM Scientists Have Learned Studying Twitter
  • # Full firehoses that ensure 100% coverage in realtime http://t.co/R03nlExx More details on our partnership with Automattic on the @gnip blog
  • # Likes from WordPress & IntenseDebate now available http://t.co/kRoBM2W4 "Automattic is an important source in the social data mix" @radian6

Switch to our mobile site