Jun 16

This probably is the most inappropriately titled post on my blog. Maybe this should have been titled “Why I think I waste so much time on proggit and hacker news” or “PR for my new feature on tagz”.

For a long time I thought that people flock to social news sites to find new links pertinent to their interests (programming, compsci, math, economics … in my case). And going by this model, I thought the utility brought on by comments on these sites are only marginal compared to the the utility provided by the inflow of new and interesting links. Lately I realized that this couldn’t be further from the truth. I hadn’t realized that I tend to spend more time reading comments than on reading the linked content. Seems like I’m more interested in what people have to say about the links than the links themselves.

This reminded me of a pain point I’ve always had with social news sites. Social news sites are kinda like walled gardens. When I’m reading the comments on one of them, I’m missing out on a lot of interesting comments on other sites. And there’s no easy way (with a few clicks) to find discussions on all sites.

Since Tagz was written with the intent of solving things which nagged me the most with social news and bookmarking sites, I decided to annotate all posts on Tagz with links to the comments page on a Delicious, Digg, Hacker News, Reddit and Twitter. Now, there are a few little inconsistencies, Delicious doesn’t have anything like a comments page, so I link to the url info page. And due to the use of url shorteners, there isn’t a way to directly search Twitter for links. I link the the Backtweets search results page for the URL.

Initially, I didn’t want to add more clutter to the main page, so I kept these links only on every post’s comments and history page. But yesterday, I thought it’d be a better idea to just include those links on the main page. One of my more marketing oriented friends even recommended against having it on the main page, since that’d likely increase the bounce rate. Honestly, I don’t really care about it and I always thought convenience takes precedence over everything else, So I just added them anyways.

Apr 06

I’ve been using memcached for all the caching on Tagz. Redis is a relatively new key value database which covers a superset of memcached’s functionality. One of the biggest problems I’ve had with memcached (actually it has nothing to do with memcached) is that whenever I store a large datastructure on memcached, deserializing (unpickling) it takes quite a while (only a couple of milliseconds, but it still counts).

This happens whenever I end up storing large lists or dictionaries on memcached. Redis solves this problem effectively by providing list and set primitives besides storing plain old strings. This effectively solves the  aforementioned problem. Also, the list primitive supports atomic push/pop commands, which could be used to implement efficient queues. And this along with the on disk persistance feature solves another problem I have with beanstalkd, which is the lack of persistance.

Overall, this seems like a great solution to quite a few tiny little problems I have with performance on Tagz. I’m planning on playing with redis a little more tonight and if all goes well, I’ll shift from memcached to redis.

Apr 01

Tagz has come a long way since I launched it last September. Something which began as a clean room django application has been accumulating a lot of cruft. One patch at a time, its turned itself into an unmaintainable mess of a codebase.

In retrospect, I feel Python and Postgres weren’t really the best choices I made for writing Tagz. I believe Tagz would be better written in PHP with MySQL as the DB. I’ve come to learn the hard way that Django with Postgresql can’t quite match the blazing speeds possible using raw PHP with MySQL (and MyISAM DBs).

Starting today, I’ve decided on stopping all development on the current code base of Tagz. I’ve begun a rewrite of Tagz in PHP. The current users may rest assured, since backwards compatibility is an important goal for this rewrite. I’m hoping to finish the rewrite in less than a month. I’m expecting the transition to be a smooth one.

Finally, Thanks to all the current users (no thanks to all the spammers) for all the encouragement and the feature requests, without which Tagz would’ve never have come close to what it is today.

Oct 02

Finally after 2 hours or hard work, I got tagz up and running. Its currently running on my vps (the same machine which was running the staging server).

Here’s the background on the issue.
The main server was running Ubuntu 8.04 (Hardy Heron)
Last night, I ran a standard apt-get update; apt-get upgrade.
It upgraded libc and libc-dev, and that was it. After that, all perl processes would just hang spinning busy on the cpu. I couldn’t do a thing. Tried running a couple of the offending scripts in strace, and they all hang on a clone() syscall. Tried restarting the AMI, but it still persisted. The worst problem was I couldn’t even get a db dump because pg_dump wouldn’t work. And the last snapshot I had was about 17 hours old.

So, here’s what I did. I terminated the postmaster instance, took a backup of the db directory, scp’d it to my vps and tried using it there. Then I found that I had to recompile postgresql with –enable-integer-datetimes for it to accept the database. Did that and few other tweaks (I’d switched DNS to point to the VPS early on) and here we have it, up and running.

I’ve got to move back to EC2 soon (The VPS wouldn’t be able to handle the loads for long). But this time, I’m going back to Debian Stable, I’ve had enough of Ubuntu, I have no idea how something as innocuous as a libc upgrade can barf things up so badly.

Oct 02

Due to a libc upgrade gone awry, tagz.in has been down for the past 30 minutes. I’m working on bringing it up asap.

UPDATE: Its up again, on a different machine.

Sep 25

Just added a new feature to Tagz yesterday (Subscriptions). Subscriptions allows all registered users to subscribe to a set of tags. Internally we ‘AND’ all the tags for every subscription and ‘OR’ the results of every subscription. Additionally, we also support stemming (like we do on every other feed). I see that a couple of users have already started using it. I just wish more people start posting more links (people tend to keep posts private) and commenting more often.

Sep 02

We silently launched tagz on September 1st. I posted about it on proggit last afternoon. The response has been pretty positive. Had some trouble initially. I did consider launching a 2nd ec2 instance at one point of time, but then the load reduced. Its relatively fast to launch another instance, change some settings and setup round robin dns scheduling. I’ve also got a whole lot of feature requests to implement.

Aug 29

My dear brother Thilak met with a minor accident this afternoon, and in the confusion the ensued, he’s spilt the beans on Tagz. It must’ve been painful to singlehandedly type the 228 word post (He’s got a cast on his right hand, because of the accident). The UI is kinda crude, but functional. Actually a couple of friends are already using/testing it. Well, we plan to release it sometime soon, but I honestly wish he hadn’t made it public so soon.

We’d been discussing this “`better` delicious reddit chimera” idea for quite some time now. Due to difficult personal circumstances in the past couple of months, I’ve been suffering from a terrible bout of insomnia. When the usual remedies for this (reading Nietzsche, driving through the city all night long etc) didn’t work, I started working on it. Then, on one of my infrequent visits to Mangalore, I showed a very crude prototype to Thilak and he was pretty enthusiastic about it. We setup a redmine instance, moved the mercurial repository to my vps and we were up and running, with a couple of commits every night.

We’ve got a long way to go before I can call it release ready. Until then, all I can say is its written using django and python, with postgresql for the db. And the `undumb` or `not dumb` (or whatever) tags thing he’s hinting about isn’t really all that smart, its just plain old tagging with porter stemming to identify similar tags.