Tornado – Whoosh – DidYouMean

Have tried to search a word in google and you got a response from google saying, ‘Did You Mean’ when the word you have typed is spelled incorrectly? Something like this? And you want to implement this feature in your engine?

Well, Whoosh search engine is capable of performing didyoumean operation on the queires presented by the user. Didyoumean essentially presents suggestions to the users on mis-typed or mis-spelled queries based on the key terms present in the index. Whoosh currently  works more of typo checker or corrector as it doesn’t have the capabilities of handling phonetics well enough…

For correction Whoosh looks up for correct words in:

  • Created Index
  • File with words list

With Whoosh, developers can define Schema fields that would be used for spell-checker. For instance, if you were to perform spell-check on contents, simply define Schema with the field ‘content’ as ‘spelling=True.’

Here’s an example of Whoosh’s didyoumean capability with Tornado Web Server

Did You Mean input query form


<html>
<head>
<title> Search </title>
</head>
<body>
<FORM action="/didyoumean" method="POST">
Spell-Check for word:<input type=text name=qstring>
<input type="submit" name="submit">
</FORM>
</body>
</html>

view raw

didyoumean.html

hosted with ❤ by GitHub

Tornado Web Server handling spell-checker requests


import whoosh,os
from whoosh import index
import whoosh.index
import whoosh.fields
import whoosh.qparser
import tornado.ioloop
import tornado.web
class Search(object):
def __init__(self, indexdir, searchstr=None):
self.indexdir = indexdir
self.searchstr = searchstr
def searcher(self):
schema = whoosh.fields.Schema(
path = whoosh.fields.ID(unique=True, stored=True),
title = whoosh.fields.TEXT(stored=True, phrase=False),
content = whoosh.fields.TEXT(stored=True, spelling=True),
tag = whoosh.fields.TEXT(stored=True),
category = whoosh.fields.TEXT(stored=True))
if not os.path.exists(self.indexdir):
os.mkdir(self.indexdir)
ix = index.create_in(self.indexdir, schema)
writer = ix.writer()
writer.add_document(title=u"Welcome", content=u"This is welcome blog!",
path=u"/welcome", tag=u"Welcome", category=u"Welcome")
writer.add_document(title=u"Python Whoosh", content=u"Whoosh search library in pure Python",
path=u"/whoosh", tag=u"whoosh", category=u"Search")
writer.add_document(title=u"Python Tornado", content=u"Tornado Web Server for real-time web apps",
path=u"/tornado", tag=u"tornado", category=u"Web Server")
writer.add_document(title=u"Python Tornado Async", content=u"Tornado Web Server provides async web requests",
path=u"/tornadoasync", tag=u"async", category=u"Web Server")
writer.add_document(title=u"Python Tornado Templates", content=u"Tornado Web Server has template feature",
path=u"/tornadotemplates", tag=u"templates", category=u"Web Server")
writer.add_document(title=u"Python Tornado", content=u"Tornado Web Server is awesome",
path=u"/tornado", tag=u"great", category=u"Web Server")
writer.commit()
_queryparser = whoosh.qparser.QueryParser('content', schema=schema)
s = ix.searcher()
return s
class Home(tornado.web.RequestHandler):
def get(self):
self.write('It Works!')
class DidYouMean(tornado.web.RequestHandler):
def get(self):
self.render('didyoumean.html')
def post(self):
from whoosh import qparser
qstring = self.get_argument('qstring')
srch = Search('./indexer')
s = srch.searcher()
corrector = s.corrector("content")
r = corrector.suggest(qstring, limit=3)
head = "<h3>DidYouMean results for %s</h3><br />" %qstring
hits = ''
for hit in r:
hits += hit
self.write(head + hits)
application = tornado.web.Application([
(r"/",Home ),
(r"/didyoumean", DidYouMean),
])
if __name__ == "__main__":
application.listen(7777)
tornado.ioloop.IOLoop.instance().start()

 

In this example, if user searches for word ‘Torando’ he gets suggestion for Tornado and if he tries for ‘piethon’ he gets Python

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.