Jeethu’s Blog

November 7, 2008

Merging Django querysets – redux

Filed under: Uncategorized — admin @ 5:35 am

On my previous post with the same title, frits commented asking wouldn’t it be possible to use filter(tags__in=l) instead of chaining filters.

I’d initially thought that chaining filters would generate a more efficient SQL query. This morning, I decided to test it.


Post.objects.filter(tags__text='python').filter(tags__text='django')

Generates the following SQL query.


SELECT "main_post"."id", "main_post"."link", "main_post"."title"
FROM "main_post"
INNER JOIN "main_post_tags" ON ("main_post"."id" = "main_post_tags"."post_id")
INNER JOIN "main_tag" ON ("main_post_tags"."tag_id" = "main_tag"."id")
INNER JOIN "main_post_tags" T4 ON ("main_post"."id" = T4."post_id")
INNER JOIN "main_tag" T5 ON (T4."tag_id" = T5."id")
    WHERE ("main_tag"."text" = 'python'  AND T5."text" = 'django' ) 

Notice that it generates a total of 4 joins, and I don’t really know why its joining on the M2M join table main_post_tags twice. Joining twice on main_tag is ok, but the 2nd join on main_post_tags could’ve been avoided.

Now, to try frits’ idea.


t1 = Tag.objects.get(text='python')
t2 = Tag.objects.get(text='django')
Post.objects.filter(tags__in=(t1,t2))

This generates a total of 3 queries, 2 to select the tags and one to select the posts. Now, if we ignore the cost of the first two queries (which are cheap, and the results can and should be cached in memory anyways), the final query has become way simpler and cheaper.


SELECT "main_post"."id", "main_post"."link", "main_post"."title"
FROM "main_post"
INNER JOIN "main_post_tags" ON ("main_post"."id" = "main_post_tags"."post_id")
    WHERE "main_post_tags"."tag_id" IN (1, 2)

The number of joins is down from 4 to 1, this is way more efficient in every way.
Thanks for the tip, frits.

November 4, 2008

Looking for a Job

Filed under: Django, Life, Python — Tags: , , , — admin @ 12:00 pm

Due to various reasons beyond the scope of this blog, I’ve had to quit my current job as a Programming Specialist at Position2. So, if you’re looking for a python / django hacker in Bangalore (although telecommuting certainly is an option I’d consider), drop me a line.

October 2, 2008

tagz.in is up again

Filed under: Linux, tagz — Tags: , , — admin @ 1:49 pm

Finally after 2 hours or hard work, I got tagz up and running. Its currently running on my vps (the same machine which was running the staging server).

Here’s the background on the issue.
The main server was running Ubuntu 8.04 (Hardy Heron)
Last night, I ran a standard apt-get update; apt-get upgrade.
It upgraded libc and libc-dev, and that was it. After that, all perl processes would just hang spinning busy on the cpu. I couldn’t do a thing. Tried running a couple of the offending scripts in strace, and they all hang on a clone() syscall. Tried restarting the AMI, but it still persisted. The worst problem was I couldn’t even get a db dump because pg_dump wouldn’t work. And the last snapshot I had was about 17 hours old.

So, here’s what I did. I terminated the postmaster instance, took a backup of the db directory, scp’d it to my vps and tried using it there. Then I found that I had to recompile postgresql with –enable-integer-datetimes for it to accept the database. Did that and few other tweaks (I’d switched DNS to point to the VPS early on) and here we have it, up and running.

I’ve got to move back to EC2 soon (The VPS wouldn’t be able to handle the loads for long). But this time, I’m going back to Debian Stable, I’ve had enough of Ubuntu, I have no idea how something as innocuous as a libc upgrade can barf things up so badly.

tagz.in is down :(

Filed under: tagz — Tags: — admin @ 11:40 am

Due to a libc upgrade gone awry, tagz.in has been down for the past 30 minutes. I’m working on bringing it up asap.

UPDATE: Its up again, on a different machine.

October 1, 2008

Merging Django querysets

Filed under: Django, Python — Tags: — admin @ 8:20 pm

This one’s pretty basic (from the docs) , but I end up using it all the times. Being able to “AND” and “OR” django querysets can really simplify a lot of code. Here’s an instance (a simplification of the setup I have in tagz). Lets start by defining 2 models.


from django.db import models

class Tag( models.Model ) :
    text = models.CharField(max_length=255)

class Post( models.Model ) :
    link  = models.URLField(max_length=2048)
    title = models.CharField(max_length=255)
    tags  = models.ManyToManyField(Tag)

Now, lets say I want to get all Posts tagged as django or python.


qs = Post.objects.filter(tags__text='python') | Post.objects.filter(tags__text='django')

Now as intuitive as it might seem, using AND doesn’t seem to work.


qs = Post.objects.filter(tags__text='python') & Post.objects.filter(tags__text='django')
# qs.count() returns 0

So, we end up chaining filters like this:


qs = Post.objects.filter(tags__text='python').filter(tags__text='django')

Which boils down to simple for loop.


def filter_tags( tags ) :
    '''
    tags: a list of strings
    '''
    p = Post.objects.all()
    for t in tags :
        p = p.filter(tags__text=t)
    return p

September 25, 2008

Tagz update

Filed under: tagz — Tags: — admin @ 10:35 am

Just added a new feature to Tagz yesterday (Subscriptions). Subscriptions allows all registered users to subscribe to a set of tags. Internally we ‘AND’ all the tags for every subscription and ‘OR’ the results of every subscription. Additionally, we also support stemming (like we do on every other feed). I see that a couple of users have already started using it. I just wish more people start posting more links (people tend to keep posts private) and commenting more often.

September 2, 2008

Tagz is now live

Filed under: tagz — Tags: — admin @ 9:24 pm

We silently launched tagz on September 1st. I posted about it on proggit last afternoon. The response has been pretty positive. Had some trouble initially. I did consider launching a 2nd ec2 instance at one point of time, but then the load reduced. Its relatively fast to launch another instance, change some settings and setup round robin dns scheduling. I’ve also got a whole lot of feature requests to implement.

August 31, 2008

I’ve just bought a MacBook

Filed under: Life — Tags: , , — admin @ 10:37 am

I’ve just bought a new Macbook and am busy setting up my dev environment on it. Things aren’t really as hard as I though they’d be. With macports, OSX doesn’t really feel much different superficially from any linux distro.

August 29, 2008

It is true

Filed under: Uncategorized — Tags: , , — admin @ 5:59 pm

My dear brother Thilak met with a minor accident this afternoon, and in the confusion the ensued, he’s spilt the beans on Tagz. It must’ve been painful to singlehandedly type the 228 word post (He’s got a cast on his right hand, because of the accident). The UI is kinda crude, but functional. Actually a couple of friends are already using/testing it. Well, we plan to release it sometime soon, but I honestly wish he hadn’t made it public so soon.

We’d been discussing this “`better` delicious reddit chimera” idea for quite some time now. Due to difficult personal circumstances in the past couple of months, I’ve been suffering from a terrible bout of insomnia. When the usual remedies for this (reading Nietzsche, driving through the city all night long etc) didn’t work, I started working on it. Then, on one of my infrequent visits to Mangalore, I showed a very crude prototype to Thilak and he was pretty enthusiastic about it. We setup a redmine instance, moved the mercurial repository to my vps and we were up and running, with a couple of commits every night.

We’ve got a long way to go before I can call it release ready. Until then, all I can say is its written using django and python, with postgresql for the db. And the `undumb` or `not dumb` (or whatever) tags thing he’s hinting about isn’t really all that smart, its just plain old tagging with porter stemming to identify similar tags.

Low level template file caching in Django

Filed under: Django — Tags: — admin @ 4:44 pm

In one of my django projects, I use a lot of recursive template tags, which seem to cause quite a bit of slowdown while rendering them. I looked at the code in django.template.loaders.filesystem


def load_template_source(template_name, template_dirs=None):
    tried = []
    for filepath in get_template_sources(template_name, template_dirs):
        try:
            return (open(filepath).read().decode(settings.FILE_CHARSET), filepath)
        except IOError:
            tried.append(filepath)
    if tried:
        error_msg = "Tried %s" % tried
    else:
        error_msg = "Your TEMPLATE_DIRS setting is empty. Change it to point to at least one template directory."
    raise TemplateDoesNotExist, error_msg
load_template_source.is_usable = True

Looks like the template is reloaded from the filesystem every time the a template is loaded. This gets really bad with custom templatetags and inclusion tags. Well, can’t we just load the file into memory, and the next time its needed, call os.stat() on the file and check if the file has been modified, if not don’t reload the file from disk. Finally, I settled on a compromise, don’t reload templates in production mode, and disable the template cache in debug mode.

Here’s template_cache.py


# -*- coding: utf-8 -*-

from django.template import loader, TemplateDoesNotExist
from django.conf import settings

template_cache = {}
def cached_loader( template_name, template_dirs=None ) :
    global template_cache
    t = template_cache.get(template_name)
    if not t :
        old_loaders = settings.TEMPLATE_LOADERS[:]
        settings.TEMPLATE_LOADERS = old_loaders[1:]
        loader.template_source_loaders = None
        try :
            template_cache[template_name] = t = loader.find_template_source( template_name, template_dirs )
        finally :
            settings.TEMPLATE_LOADERS = old_loaders # To avoid recursively calling cached_loader
        loader.template_source_loaders = None
    return t
cached_loader.is_usable = not settings.DEBUG    # Avoid caching in debug mode
Older Posts »

Powered by WordPress