Tornado – Whoosh

 

Search functionality is often considered as “good to have” feature in website development, but search plays a crucial role of locating relevant information to the website visitors. Serach capabilities can be built into a website easily with modules such as lucene, Solr, ElasticSearch and Haystack among others.

This blog discusses about whoosh and how it can be integrated with Tornado web server.

Whoosh is a fast search engine developed by Matt Chaput that supports field based full text index search, storage, text analysis, posting formats and scoring algorithm. You can benefit from services like highlighted search, fuzzy search, document based search (more like this) and with spell checker (did you mean). Whoosh APIs are pythonic and are developed in pure python.. 🙂

Let’s take an example of blogger with the code snippet below

import whoosh,os
from whoosh import index
import whoosh.index
import whoosh.qparser
import tornado.ioloop
import tornado.web
class Search(object):
def __init__(self, indexdir, searchstr):
self.indexdir = indexdir
self.searchstr = searchstr
def searcher(self):
schema = whoosh.fields.Schema(
path = whoosh.fields.ID(unique=True, stored=True),
title = whoosh.fields.TEXT(stored=True, phrase=False),
content = whoosh.fields.TEXT(),
tag = whoosh.fields.TEXT(stored=True),
category = whoosh.fields.TEXT(stored=True))
if not os.path.exists(self.indexdir):
os.mkdir(self.indexdir)
ix = index.create_in(self.indexdir, schema)
writer = ix.writer()
writer.add_document(title=u"Welcome", content=u"This is welcome blog!",
path=u"/welcome", tag=u"Welcome", category=u"Welcome")
writer.add_document(title=u"Python Whoosh", content=u"Whoosh search library in pure Python",
path=u"/whoosh", tag=u"whoosh", category=u"Search")
writer.add_document(title=u"Python Tornado", content=u"Tornado Web Server for real-time web apps",
path=u"/tornado", tag=u"tornado", category=u"Web Server")
writer.commit()
_queryparser = whoosh.qparser.QueryParser('content', schema=schema)
s = ix.searcher()
return s.search(_queryparser.parse(
unicode(self.searchstr)), limit=50)
class Home(tornado.web.RequestHandler):
def get(self):
self.render('searchform.html')
class Srch(tornado.web.RequestHandler):
def get(self):
q = self.get_argument("q")
srch = Search('./indexdir', q)
results = srch.searcher()
self.write(str(results))
application = tornado.web.Application([
(r"/",Home ),
(r"/search",Srch ),
])
if __name__ == "__main__":
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()

view raw
tornadowhoosh.py
hosted with ❤ by GitHub

In the above code,

  • class Search provides searching capabilities with Whoosh
  • __init__ method accepts the indexdir (directory where serach index gets created) and searchstr  (string that needs to be searched)
  • searcher() method first defines a document schema (this is how a blog would look), and creates an index in indexdir based on the schema. It then creates a writer object that is used to add blogs and commit those. Finally search() method returns the search results with a max limit of 50 searches
<html>
<head>
<title> Search </title>
</head>
<body>
<FORM action="/search" method="get">
<input type=text name=q>
<input type="submit" name="submit">
</FORM>
</body>
</html>

view raw
searchform.html
hosted with ❤ by GitHub

  • On submitting the search word, GET request is sent to http://localhost:8888/search.
  • class Srch handles this request and in turn calls Search class that implements search functionality with Whoosh

When the user searches for tornado we get this output below which suggests that the word tornado was found in 1 document and it took .0004 secs to search for it

<1/1 Results for Term('content', u'tornado', boost=1.0) runtime=0.000482797622681>

 

2 thoughts on “Tornado – Whoosh

  1. Thanks Vishal.. Yes, Whoosh supports field based full text index search, storage, text analysis, posting formats and scoring algorithm. You can benefit from services like highlighted search, fuzzy search and document based search (more like this).

    Whoosh’s APIs are pythonic and is developed in pure python.. Will add this info in the blog as well

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.