Have tried to search a word in google and you got a response from google saying, ‘Did You Mean’ when the word you have typed is spelled incorrectly? Something like this? And you want to implement this feature in your engine?
Well, Whoosh search engine is capable of performing didyoumean operation on the queires presented by the user. Didyoumean essentially presents suggestions to the users on mis-typed or mis-spelled queries based on the key terms present in the index. Whoosh currently works more of typo checker or corrector as it doesn’t have the capabilities of handling phonetics well enough…
For correction Whoosh looks up for correct words in:
- Created Index
- File with words list
With Whoosh, developers can define Schema fields that would be used for spell-checker. For instance, if you were to perform spell-check on contents, simply define Schema with the field ‘content’ as ‘spelling=True.’
Here’s an example of Whoosh’s didyoumean capability with Tornado Web Server
Did You Mean input query form
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<html> | |
<head> | |
<title> Search </title> | |
</head> | |
<body> | |
<FORM action="/didyoumean" method="POST"> | |
Spell-Check for word:<input type=text name=qstring> | |
<input type="submit" name="submit"> | |
</FORM> | |
</body> | |
</html> |
Tornado Web Server handling spell-checker requests
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import whoosh,os | |
from whoosh import index | |
import whoosh.index | |
import whoosh.fields | |
import whoosh.qparser | |
import tornado.ioloop | |
import tornado.web | |
class Search(object): | |
def __init__(self, indexdir, searchstr=None): | |
self.indexdir = indexdir | |
self.searchstr = searchstr | |
def searcher(self): | |
schema = whoosh.fields.Schema( | |
path = whoosh.fields.ID(unique=True, stored=True), | |
title = whoosh.fields.TEXT(stored=True, phrase=False), | |
content = whoosh.fields.TEXT(stored=True, spelling=True), | |
tag = whoosh.fields.TEXT(stored=True), | |
category = whoosh.fields.TEXT(stored=True)) | |
if not os.path.exists(self.indexdir): | |
os.mkdir(self.indexdir) | |
ix = index.create_in(self.indexdir, schema) | |
writer = ix.writer() | |
writer.add_document(title=u"Welcome", content=u"This is welcome blog!", | |
path=u"/welcome", tag=u"Welcome", category=u"Welcome") | |
writer.add_document(title=u"Python Whoosh", content=u"Whoosh search library in pure Python", | |
path=u"/whoosh", tag=u"whoosh", category=u"Search") | |
writer.add_document(title=u"Python Tornado", content=u"Tornado Web Server for real-time web apps", | |
path=u"/tornado", tag=u"tornado", category=u"Web Server") | |
writer.add_document(title=u"Python Tornado Async", content=u"Tornado Web Server provides async web requests", | |
path=u"/tornadoasync", tag=u"async", category=u"Web Server") | |
writer.add_document(title=u"Python Tornado Templates", content=u"Tornado Web Server has template feature", | |
path=u"/tornadotemplates", tag=u"templates", category=u"Web Server") | |
writer.add_document(title=u"Python Tornado", content=u"Tornado Web Server is awesome", | |
path=u"/tornado", tag=u"great", category=u"Web Server") | |
writer.commit() | |
_queryparser = whoosh.qparser.QueryParser('content', schema=schema) | |
s = ix.searcher() | |
return s | |
class Home(tornado.web.RequestHandler): | |
def get(self): | |
self.write('It Works!') | |
class DidYouMean(tornado.web.RequestHandler): | |
def get(self): | |
self.render('didyoumean.html') | |
def post(self): | |
from whoosh import qparser | |
qstring = self.get_argument('qstring') | |
srch = Search('./indexer') | |
s = srch.searcher() | |
corrector = s.corrector("content") | |
r = corrector.suggest(qstring, limit=3) | |
head = "<h3>DidYouMean results for %s</h3><br />" %qstring | |
hits = '' | |
for hit in r: | |
hits += hit | |
self.write(head + hits) | |
application = tornado.web.Application([ | |
(r"/",Home ), | |
(r"/didyoumean", DidYouMean), | |
]) | |
if __name__ == "__main__": | |
application.listen(7777) | |
tornado.ioloop.IOLoop.instance().start() |
In this example, if user searches for word ‘Torando’ he gets suggestion for Tornado and if he tries for ‘piethon’ he gets Python