Jun 22

Some 6 months ago, out of curiousity, I tried using Wireshark to monitor the http traffic browser was generating while casually browsing the web on Firefox with the StumbleUpon toolbar. The resulting traffic dump was pretty interesting. The Stumbleupon toolbar was calling home for every single URL I visited.

The extension was making an HTTP POST request for every url I’d visited to http://74.201.117.232/getmeta.php?username=<my_user_id>.

Stumbleupon screenshot

The use case for this is to be to check if I’ve rated the url, and if so, what’s my rating. But then, why not cache the data instead of calling home every time I open a page (even in the same browsing session) ? I freaked out and uninstalled the toolbar the very moment, and that was pretty much it. Maybe mentioned it in passing to a couple of friends. Most of them didn’t really care about this since they don’t use StumbleUpon and those who did thought it was du jour with all the social bookmarking/news toolbars, which is true. Almost every other toolbar also does this.

I thought about this for a while, thought the toobars should hash the urls before sending it to the servers, and using bloom filters to reduce the number of time the client would have to call home to check if a user has rated a url. And that was pretty much it, until last week.

kamathln told me that someone wants to write a Jetpack extension for Tagz. That someone turned out to be Yathi, an old mutual friend of ours. He wanted to write a relatively simple Jetpack extension and wanted some server side support to get info on any url (comments, points, number of saves etc). I quickly added the requisite  server side support and he quickly hacked together a nice little jetpack script. It turned out to be one of the first few jetpack extensions on userscripts.org, a couple of people started using it and all was fine. Until I started looking at the server logs, when it felt like a deja vu all over again. Our script was leaking our users’ browsing history quite like I’d observed with the StumbleUpon toolbar 6 months ago.

Eventually, I decided that sending urls in plain text is a bad idea. Also, the lookups should be cached atleast for a short while. An extended form of the idea involved using bloom filters, but that’d have been too much work. So, we now normalize the url to a standard representation, then hash it with sha256 and then send the hash to the server. Although this is not quite a completely bullet proof solution, its certainly better than sending the url to the server every time.

Jun 16

This probably is the most inappropriately titled post on my blog. Maybe this should have been titled “Why I think I waste so much time on proggit and hacker news” or “PR for my new feature on tagz”.

For a long time I thought that people flock to social news sites to find new links pertinent to their interests (programming, compsci, math, economics … in my case). And going by this model, I thought the utility brought on by comments on these sites are only marginal compared to the the utility provided by the inflow of new and interesting links. Lately I realized that this couldn’t be further from the truth. I hadn’t realized that I tend to spend more time reading comments than on reading the linked content. Seems like I’m more interested in what people have to say about the links than the links themselves.

This reminded me of a pain point I’ve always had with social news sites. Social news sites are kinda like walled gardens. When I’m reading the comments on one of them, I’m missing out on a lot of interesting comments on other sites. And there’s no easy way (with a few clicks) to find discussions on all sites.

Since Tagz was written with the intent of solving things which nagged me the most with social news and bookmarking sites, I decided to annotate all posts on Tagz with links to the comments page on a Delicious, Digg, Hacker News, Reddit and Twitter. Now, there are a few little inconsistencies, Delicious doesn’t have anything like a comments page, so I link to the url info page. And due to the use of url shorteners, there isn’t a way to directly search Twitter for links. I link the the Backtweets search results page for the URL.

Initially, I didn’t want to add more clutter to the main page, so I kept these links only on every post’s comments and history page. But yesterday, I thought it’d be a better idea to just include those links on the main page. One of my more marketing oriented friends even recommended against having it on the main page, since that’d likely increase the bounce rate. Honestly, I don’t really care about it and I always thought convenience takes precedence over everything else, So I just added them anyways.