Grant did an interview with Sammy Yu, who worked on the search system for Digg.com that utilizes Solr as their platform. Here are some notes from the interview:
Number of Documents in Digg’s index: 13 Million
Index Size (Lucene) on Disk: 8 GB
Architecture: Master - Slave setup, with 10 slaves, running being a load balancer with some [...]
Just read an article posted on the Lucene blog - “Lucene and the Corporate Environment”
If the list of companies using Lucene are not “corporate” environments, then I don’t know what corporate means. If by corporate packaging, you mean it has a lot of bloat and charges exorbitant license fees, then no, unfortunately, Lucene [...]
I was looking through Luence’s source code today (okay - night) to find whether you could provide hints to Lucene to change the clause precedence during query execution. Unfortunately, I found that Lucene does not support users to supply any such hint (I was looking at ConjunctionScorer).
At work, we have a use case, where we [...]
Are you doing range searches in Lucene / Solr in your application? If so, you can get performance boost by using the new TrieRange package.
Here is a ppt that details the capability.
If you want to read more, you can read the article posted by Grant on Lucid’s site.
Solr now supports Tika through ExtractingRequestHandler
It is now possible to send any of Tika’s supported document types (MS Office, PDF, XML, HTML, etc.) and have the content extracted and then indexed, all within Solr.
A natural enhancement / extension to Metadata extraction and identification toolkit would be to layer a content analysis framework on top. For [...]
Grant Ingersoll has published a new article on IBM developer works that talks about new features in Solr 1.3.
“Did you mean” Spellchecking
Finding similar pages (More like this)
Editorial results placement - Ability to specify that a particular document (or documents) appear at a particular place in the search results.
Distributed Search - Solr adds distributed search capabilities [...]