Abstraction in Search

Another great event I attended this year.. PyCon India 2012, was better organized, had better talks, more audiences, job fair and more fun than ever before.. πŸ™‚ Not to forget the evening dinner for speakers πŸ˜‰ I loved every bit of it…. talking to experts, talking to Python enthusiasts, answering their Qs and wondering why I was not like the younger folks when I was younger? πŸ˜›

Vishal and I delivered a talk on ‘Rapid development of website search in Python’.. We spoke about,

  • Why Search is imperative in web sites
  • How is the Schema defined and Analyzers chosen
  • How indexing, and searching works with appropriate flowcharts
  • How search can be easily integrated with your web application
  • What are the design and development considerations for implementing it

We also shared our observations on facets of a good search solution, It should be:

  • Integral to the website development
  • Decoupled from the web framework used for websiteΒ  development
  • Adaptable (scale and requirements of website)
  • And most importantly it should be rapidly developed and deployed

This talk provoked a new design concept of Abstraction in Search (never tried before as we know of) and contributed to the Python community at large…

Preface

We all understand no same solution fit for two different problems. The same phenomena applies for search engines as well.. A search engine may have high indexing, committing capabilities but slower searching algorithm when compared to an equivalently feature rich engine. Hence a search engine is deemed to be the best solution for one website but maybe an utter unfit for other…

Problem

Now developing search with one particular algorithm or a particular engine, and plugging it into any website that you develop is no less than digging your own grave! Why the h**l would you assume that the one search solution that you’ve develop for your large scale website is suitable for other small or medium scaled or sized websites?

Solution

We propose development of customized search engines that are adaptable to the small/medium and large scaled & sized websites. Once you have the search engine implementations, develop an Abstraction Layer over these engines. Abstraction Layer would ensure:

  • Freedom to choose an engine based on applicability and adaptability to the website
  • Develop once and reuse as many times
  • Call to a search engine can be decided at run time

The abstraction layer could be implemented in a well know facade pattern way!

Design

We propose a simple to understand SVC model (based on MVC model). SVC stands for Search View Controller. In SVC, the Controller, calls search.py with appropriate search engine to find the search results for user input keywords. search.py is an abstraction developed on the search engines implementations that can adapt to small, mid and large scaled & sized websites. The decision to call a search solution from search.py abstraction is dependent on the website developer (as s/he understands the requirements of website and the search solution for it). Selected search engine then generates the search results for input query terms and passes onto the controller via search.py. Controller then applies the search results to the View (templates) and renders the results to the user..

Prototype Implementation

We’ve developed a prototype for the idea discussed above (termed as fsMgr). fsMgr assumes that the webpages that need to be search are already available (or scrapped) in a tree structure.

search.py of fsMgr abstracts Whoosh and pyLucene search engines. By doing this, we demonstrate, how either of these engines can be leveraged for website search based on the website requirements.

We use Tornado Web Server of Python as Controller that provides us request handling capabilities so that we can export simple search and advanced search capabilities (such as highlighted search, didyoumean spell-checker and morelikethis document searcher) to the users.

Tornado’s template capabilities are used as Views in this prototype.

Code

Source code of this prototype implementation at fsMgr

SVC Architecture

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.