Controlling Data Through URL Shorteners
2009 30
I’m going to sidestep the “URL shorteners are bad because they obfuscate” discussion in this post. If you’re reading this, you likely have an opinion one way or another on that topic, but let’s leave that at the door. A bigger challenge is emerging as URL shortening continues to proliferate.
Web browsers unwinding a shortened URL when a user clicks on one is one thing, but when system software tries to unwind/resolve shortened URLs en masse, a problem emerges. The database that binds the short URL to its long version is hidden behind an API that can’t handle, or won’t allow (and I’m pointing at all of you URL shorteners out there), bulk unwinding of shortened URLs. The result is a bottleneck (the URL shortening services) that prevents “real-time” indexing of otherwise publicly available content. “Classic” offline crawl based search engines (e.g. Google, Y!, etc) will likely unwind in a latent “offline” manner, based on relevance. However, real-time search facilities are faced with trying to unwind large numbers of shortened URLs on the fly, and there doesn’t appear to be a way to accomplish this as the volume/rate of shortened URLs ever increases in daily social activity.
If your business relies on unwinding large volumes of shortened URLs in real-time, you’re faced with the usual optimization suspects: caching & relevance/prioritization based resolution. These will improve your ability to “keep up”, but they are a function of cache/hit ratios (which are generally poor in the social space when it comes to URL unwinding) and your own ability to decide what to unwind in an ever increasing volume of shortened URLs.
The result is another case of data control. If URL shortener & vanity host/URL adoption continues, and all URLs turn into redirects, we’ve become completely dependent on services that appear to be unwilling to open up their databases. I would appreciate part of this emerging standard including the ability to unwind in bulk.
