Another great event I attended this year.. PyCon India 2012, was better organized, had better talks, more audiences, job fair and more fun than ever before.. π Not to forget the evening dinner for speakers π I loved every bit of it…. talking to experts, talking to Python enthusiasts, answering their Qs and wondering why I was not like the younger folks when I was younger? π
Vishal and I delivered a talk on ‘Rapid development of website search in Python’.. We spoke about,
- Why Search is imperative in web sites
- How is the Schema defined and Analyzers chosen
- How indexing, and searching works with appropriate flowcharts
- How search can be easily integrated with your web application
- What are the design and development considerations for implementing it
We also shared our observations on facets of a good search solution, It should be:
- Integral to the website development
- Decoupled from the web framework used for websiteΒ development
- Adaptable (scale and requirements of website)
- And most importantly it should be rapidly developed and deployed
This talk provoked a new design concept of Abstraction in Search (never tried before as we know of) and contributed to the Python community at large…
Preface
We all understand no same solution fit for two different problems. The same phenomena applies for search engines as well.. A search engine may have high indexing, committing capabilities but slower searching algorithm when compared to an equivalently feature rich engine. Hence a search engine is deemed to be the best solution for one website but maybe an utter unfit for other…
Problem
Now developing search with one particular algorithm or a particular engine, and plugging it into any website that you develop is no less than digging your own grave! Why the h**l would you assume that the one search solution that you’ve develop for your large scale website is suitable for other small or medium scaled or sized websites?
Solution
We propose development of customized search engines that are adaptable to the small/medium and large scaled & sized websites. Once you have the search engine implementations, develop an Abstraction Layer over these engines. Abstraction Layer would ensure:
- Freedom to choose an engine based on applicability and adaptability to the website
- Develop once and reuse as many times
- Call to a search engine can be decided at run time
The abstraction layer could be implemented in a well know facade pattern way!
Design
We propose a simple to understand SVC model (based on MVC model). SVC stands for Search View Controller. In SVC, the Controller, calls search.py with appropriate search engine to find the search results for user input keywords. search.py is an abstraction developed on the search engines implementations that can adapt to small, mid and large scaled & sized websites. The decision to call a search solution from search.py abstraction is dependent on the website developer (as s/he understands the requirements of website and the search solution for it). Selected search engine then generates the search results for input query terms and passes onto the controller via search.py. Controller then applies the search results to the View (templates) and renders the results to the user..
Prototype Implementation
We’ve developed a prototype for the idea discussed above (termed as fsMgr). fsMgr assumes that the webpages that need to be search are already available (or scrapped) in a tree structure.
search.py of fsMgr abstracts Whoosh and pyLucene search engines. By doing this, we demonstrate, how either of these engines can be leveraged for website search based on the website requirements.
We use Tornado Web Server of Python as Controller that provides us request handling capabilities so that we can export simple search and advanced search capabilities (such as highlighted search, didyoumean spell-checker and morelikethis document searcher) to the users.
Tornado’s template capabilities are used as Views in this prototype.
Code
Source code of this prototype implementation at fsMgr
SVC Architecture