Search functionality is often considered as “good to have” feature in website development, but search plays a crucial role of locating relevant information to the website visitors. Serach capabilities can be built into a website easily with modules such as lucene, Solr, ElasticSearch and Haystack among others.
This blog discusses about whoosh and how it can be integrated with Tornado web server.
Whoosh is a fast search engine developed by Matt Chaput that supports field based full text index search, storage, text analysis, posting formats and scoring algorithm. You can benefit from services like highlighted search, fuzzy search, document based search (more like this) and with spell checker (did you mean). Whoosh APIs are pythonic and are developed in pure python.. 🙂
Let’s take an example of blogger with the code snippet below
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import whoosh,os | |
from whoosh import index | |
import whoosh.index | |
import whoosh.qparser | |
import tornado.ioloop | |
import tornado.web | |
class Search(object): | |
def __init__(self, indexdir, searchstr): | |
self.indexdir = indexdir | |
self.searchstr = searchstr | |
def searcher(self): | |
schema = whoosh.fields.Schema( | |
path = whoosh.fields.ID(unique=True, stored=True), | |
title = whoosh.fields.TEXT(stored=True, phrase=False), | |
content = whoosh.fields.TEXT(), | |
tag = whoosh.fields.TEXT(stored=True), | |
category = whoosh.fields.TEXT(stored=True)) | |
if not os.path.exists(self.indexdir): | |
os.mkdir(self.indexdir) | |
ix = index.create_in(self.indexdir, schema) | |
writer = ix.writer() | |
writer.add_document(title=u"Welcome", content=u"This is welcome blog!", | |
path=u"/welcome", tag=u"Welcome", category=u"Welcome") | |
writer.add_document(title=u"Python Whoosh", content=u"Whoosh search library in pure Python", | |
path=u"/whoosh", tag=u"whoosh", category=u"Search") | |
writer.add_document(title=u"Python Tornado", content=u"Tornado Web Server for real-time web apps", | |
path=u"/tornado", tag=u"tornado", category=u"Web Server") | |
writer.commit() | |
_queryparser = whoosh.qparser.QueryParser('content', schema=schema) | |
s = ix.searcher() | |
return s.search(_queryparser.parse( | |
unicode(self.searchstr)), limit=50) | |
class Home(tornado.web.RequestHandler): | |
def get(self): | |
self.render('searchform.html') | |
class Srch(tornado.web.RequestHandler): | |
def get(self): | |
q = self.get_argument("q") | |
srch = Search('./indexdir', q) | |
results = srch.searcher() | |
self.write(str(results)) | |
application = tornado.web.Application([ | |
(r"/",Home ), | |
(r"/search",Srch ), | |
]) | |
if __name__ == "__main__": | |
application.listen(8888) | |
tornado.ioloop.IOLoop.instance().start() |
In the above code,
- class Search provides searching capabilities with Whoosh
- __init__ method accepts the indexdir (directory where serach index gets created) and searchstr (string that needs to be searched)
- searcher() method first defines a document schema (this is how a blog would look), and creates an index in indexdir based on the schema. It then creates a writer object that is used to add blogs and commit those. Finally search() method returns the search results with a max limit of 50 searches
- When the user browses to http://localhost:8888/, search form is rendered.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<html> | |
<head> | |
<title> Search </title> | |
</head> | |
<body> | |
<FORM action="/search" method="get"> | |
<input type=text name=q> | |
<input type="submit" name="submit"> | |
</FORM> | |
</body> | |
</html> |
- On submitting the search word, GET request is sent to http://localhost:8888/search.
- class Srch handles this request and in turn calls Search class that implements search functionality with Whoosh
When the user searches for tornado we get this output below which suggests that the word tornado was found in 1 document and it took .0004 secs to search for it
<1/1 Results for Term('content', u'tornado', boost=1.0) runtime=0.000482797622681>
This is great! could you also share top features of whoosh?
Thanks Vishal.. Yes, Whoosh supports field based full text index search, storage, text analysis, posting formats and scoring algorithm. You can benefit from services like highlighted search, fuzzy search and document based search (more like this).
Whoosh’s APIs are pythonic and is developed in pure python.. Will add this info in the blog as well