01 Jul
I was looking through Luence’s source code today (okay - night) to find whether you could provide hints to Lucene to change the clause precedence during query execution. Unfortunately, I found that Lucene does not support users to supply any such hint (I was looking at ConjunctionScorer).
At work, we have a use case, where we have knowledge about the data that gets indexed into the system. Post query classification and pre-processing, we can utilize this knowledge to inform lucene about what we think the execution order of each clause in the query should be. This could drastically improve the performance of the query, mostly for AND queries where, when run independently, one clause would return a handful of results and the other clause would return thousands or even tens of thousands of results.
In the past, I have worked on IR systems that would maintain cardinalities (statistics) of the index, that would help optimize the query and produce a best (in given time and resources) plan.
Does anyone know whether Lucene maintains these cardinalities internally? If so, how does it impact the query execution plan?
02 Mar
django-reviews - A generic review application for Django projects, which allows association of a number of reviews with any Model instance and makes retrieval of reviews simple.
I dug around the django space and couldn’t find a generic review application that would allow users to add a review (including ratings) to an entity. So, I decided to write one myself. The current application is modeled similar to django comments. The screenshot above is what you can use the application for.
20 Feb
This snippet allows you to login with either username or email in your Django applications. All you have to do is add the middleware and combine it in the authentication backends.
09 Jan
For those of you who want to run map reduce jobs using Hadoop on your Mac - here is a great article that guides you through a step by step process.
04 Nov
Anyone who has worked on a decent size product where Django is serving as the web front, understands how monotonous the update / deployment procedure gets. Especially, when you have more than a couple of servers serving your user base. And if you are one of the companies who have embraced the cloud computing platform (Amazon Web Services), then you already have an appreciation for the problem.
Fabric is one such tool that helps in solving that problem.
Fabric is a tool that, at its core, logs into a number of hosts with SSH, and executes a set of commands, and possibly uploads or downloads files.
Check out Will Larsen’s post on how you can use Fabric. He has done a great job of explaining it.