Productification

Intersection between Technology, User Experience and Product Innovation


18 Nov

How Solr powers Local / Geospatial Applications


Checkout the Webinar on how Solr and Lucene are used to power Local Search and Geo Spatial applications. This includes the work I was involved with at AT&T Interactive building YP.com

http://www.bitpipe.com/detail/RES/1257457967_42.html

Posted via email from Sameer’s posterous


No Response Filed under: Products, Search
14 Sep

Bing making strides - now Visual Search


Bing announced it’s new search feature today - Visual Search. Visual Search allows you to look at information in imagery form. The information on the home page is organized into groups, which Bing calls “galleries”. They are offering a visual mechanism to slice and dice the information. Apart from facets on the left hand side, they now have the result area populated with images which users can hover to get more information, or click to dive deeper.

This definitely is an up from what the other major search engines provide as they are starting to play with specific visualization based on the “type” of information. Examples of it could be seen in Product Search, Book Search, Auto Search, etc.

Personally, I love it and makes search more engaging and leads the user into discovery mode. The only negative part is that I had to install a “plug-in” to my computer :(.


20 Aug

Insight into Search on Digg.com powered by Solr


Grant did an interview with Sammy Yu, who worked on the search system for Digg.com that utilizes Solr as their platform. Here are some notes from the interview:

Number of Documents in Digg’s index: 13 Million

Index Size (Lucene) on Disk: 8 GB

Architecture: Master - Slave setup, with 10 slaves, running being a load balancer with some caching.

Query Volume: 4.8 million queries / day


No Response Filed under: Search Tags:
17 Jul

Lucene / Solr still needs hackers to get up and running


Just read an article posted on the Lucene blog - “Lucene and the Corporate Environment

If the list of companies using Lucene are not “corporate” environments, then I don’t know what corporate means. If by corporate packaging, you mean it has a lot of bloat and charges exorbitant license fees, then no, unfortunately, Lucene is not ready to succeed in the corporate environment. If by corporate environment, it means it is used to save time/money/energy, then Lucene should break out the khakis and button-down shirt and start punching the clock.

Before I go on with this rant, I want to say that I love lucene/solr as what they have to offer and I myself have been using / customizing / delivering innovative solutions based on lucene since 2001. But after spending years with other enterprise search products and lucene / solr, I agree with the statement that Lucene/Solr still have some ways to go to make it really easy for the corporate/enterprise adoption.

One of the things I see in a leading enterprise solution is that it is relatively easy to see value of the product after the installation and it does not require a bunch of hackers to get it up and running. From where I am looking, most of the companies deploying a Solr/Lucene based solution requires programmers who understand IR/Search on their payroll to get the system running.

In a decent sized deployment, there needs to be infrastructure work on monitoring / replication, performance, etc - which in other enterprise search products is mostly built in.

For corporates who have data in their various Silos (DBs, file systems, intranet), Solr/Lucene does not yet provide the full suite of connectors to ingest that data. There are connectors, but again, one has to understand them, their use and how to integrate into Solr/Lucene. Good enterprise software solutions provide management interfaces to configure connectors along with a variety of connector choices (commercial and open source).

Yes, Lucene/Solr is a great platform for companies who want to go above and beyond in delivering value but having the right expertise in house is a key to success. And yes, there is some work needed for Solr/Lucene so that it’s an easy deployment for enterprises.


01 Jul

Search User Interfaces - Book


A book by Marti Hearst’s for people designing and building search user interfaces.

Search User Interfaces - The book is also available to be read online.

Chapters:
1: Design of Search User Interfaces
2: Evaluation of Search User Interfaces
3: Models of the Information Seeking Process
4: Query Specification
5: Presentation of Search Results
6: Query Reformulation
7: Supporting the Search Process
8: Integrating Navigation with Search
9: Personalization in Search
10: Information Visualization for Search Interfaces
11: Information Visualization for Text Analysis
12: Emerging Trends in Search Interfaces


No Response Filed under: Search Tags:
01 Jul

How to influence the query plan in Lucene / Solr?


I was looking through Luence’s source code today (okay - night) to find whether you could provide hints to Lucene to change the clause precedence during query execution. Unfortunately, I found that Lucene does not support users to supply any such hint (I was looking at ConjunctionScorer).

At work, we have a use case, where we have knowledge about the data that gets indexed into the system. Post query classification and pre-processing, we can utilize this knowledge to inform lucene about what we think the execution order of each clause in the query should be. This could drastically improve the performance of the query, mostly for AND queries where, when run independently, one clause would return a handful of results and the other clause would return thousands or even tens of thousands of results.

In the past, I have worked on IR systems that would maintain cardinalities (statistics) of the index, that would help optimize the query and produce a best (in given time and resources) plan.

Does anyone know whether Lucene maintains these cardinalities internally? If so, how does it impact the query execution plan?


2 Responses Filed under: Development, Search Tags: ,
19 May

Solr/Lucene Feature Alert: TrieRange Capabilities


Are you doing range searches in Lucene / Solr in your application? If so, you can get performance boost by using the new TrieRange package.

Here is a ppt that details the capability.

If you want to read more, you can read the article posted by Grant on Lucid’s site.


No Response Filed under: Search Tags: , ,
19 May

Search Trends: Length of search queries is increasing


Hitwise reports that the length of the searches performed by users has increased compared to last year.

Longer search queries, averaging searches of 5+ words in length, have increased 10 percent comparing January 2009 to January 2008.

Number of keywords per query

Number of keywords per query

Source: Hitwise - [pdf]


No Response Filed under: Search, Trends
16 Apr

Digg gets Faceted Navigation (Solr?)


Type a search on Digg and you’ll see faceted navigation on the left hand side. I think they are doing an awesome job of showing the sparklines to show the trend of volume of the post for the search.

Digg Faceted Navigation

24 Feb

Search / IR Papers at WWW 2009


From WWW2009 list of accepted papers, below are the papers from Search Track:

  • Xing Yi, Hema Raghavan and Chris Leggetter - Discover Users’ Specific Geo Intention in Web Search
  • Xiangfu Meng, Z. M. Ma and Li Yan - Answering Approximate Queries over Autonomous Web Databases
  • Huanhuan Cao, Daxin Jiang, Jian Pei, Enhong Chen and Hang Li - Towards Context-Aware Search by Learning A Very Large Variable Length Hidden Markov Model from Search Logs
  • Eustache Diemert and Gilles Vandelle - Unsupervised Query Categorization using Automatically-Built Concept Graphs
  • Jian Hu, gang wang, Fred Lochovsky and Zheng Chen - Understanding User’s Query Intent with Wikipedia
  • Sreenivas Gollapudi and Aneesh Sharma - An Axiomatic Approach to Result Diversification
  • Flavio Chierichetti, Ravi Kumar and Prabhakar Raghavan - Compressed web indexes
  • Andrei Broder, Flavio Chierichetti, Vanja Josifovski, Ravi Kumar, Sandeep Pandey and Sergei Vassilvitskii - Nearest-Neighbor Caching for Content-Match Applications
  • Paul Bennett, Max Chickering and Anton Mityagin - Learning Consensus Opinion:Mining Data from a Labeling Game (will be here)
  • Andrei Broder, Peter Ciccolo, Evgeniy Gabrilovich, Vanja Josifovski, Donald Metzler, Lance Riedel and Jeffrey Yuan - Online Expansion of Rare Queries for Sponsored Search
  • Xuerui Wang, Andrei Broder, Marcus Fontoura and Vanja Josifovski - A Search-based Method for Forecasting Ad Impression in Contextual Advertising
  • Olivier Chapelle and Ya Zhang - A Dynamic Bayesian Network Click Model for Web Search Ranking
  • hao yan, Shuai Ding and Torsten Suel - Inverted Index Compression and Query Processing with Optimized Document Ordering
  • QINGQING GAN and Torsten Suel - Improved Techniques for Result Caching in Web Search Engines
  • Shuai Ding, Jinru He, Hao Yan and Torsten Suel - Using Graphics Processors for High Performance IR Query Processing
  • Jay Chen, Lakshminarayanan Subramanian and Jinyang Li - RuralCafe: Web Search in the Rural Developing World
  • Shengyue Ji, Guoliang Li, Chen Li and jianhua feng - Efficient Interactive Fuzzy Keyword Search
  • Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Arnd Konig and Dong Xin - Exploiting Web Search Engines to Search Structured Information Sources
  • Deepayan Chakrabarti, Ravi Kumar and Kunal Punera - Quicklink Selection for Navigational Query Results

No Response Filed under: Search Tags: