100,000+ hits on TechnoBeans

Thanks guys for your overwhelming responses to my blogs over the last couple of years..

Technobeans has managed to get 100,000+ hits and credit goes to you all who viewed and referred the content to your friends, and colleagues. While this is great milestone, I wish you guys continue to look forward to more blogs, articles, books, open source projects on Python, Java, NodeJS and many more interesting topics to follow..

Occasions like these keep me motivated to say the least.

Thanks again!

Advertisements

Node.js

This is an introduction post to node.js..

Node.js is an event driven, non blocking (async) I/O style software that is used to develop server side implementations. (If you’re a Python fan, it’s like programming in Twisted!). It’s also built on Google’s V8 JavaScript engine. Like other event driven servers, Node.js, runs an event loop and handles the events asynchronously with callback invocations.

A typical ‘ hello world ‘ web server implementation can be found below. Run as: node helloworld.js

In this example,

1. we import module http and create a http server that listens on port 8888.

2. When the user makes a Get request on http://localhost:8888, the web server renders Hello World on the web browser. You can play around with request and response variables as used in the server code.

3. It’s interesting to note that if reponse.end() statement is commented out, the request doesn’t complete and the server hangs. If you press Ctrl+C, only then the request completes and ‘Hello World’ is rendered on the browser. So server developers, beware!

You may ask, what’s this function(request, response) and why it’s anonymous? Well, that’s a motivation for you to read my next post! 🙂

Selenium with Python bindings

After a lot of posts on Tornado web server and understanding BDD, lets get to testing our website. What better than to you selenium. Lets go through the setup and create our first test..

Prerequisites

1. Python bindings for Selenium – Go to, selenium site and download the package

Install as:

  • tar xvf selenium-2.25.0.tar.gz
  • cd selenium-2.25.0
  • sudo python setup.py install

2. Java Server – Download the server from here

Run as:

  • java -jar selenium-server-standalone-2.25.0.jar

Here we discuss the usage of Selenium 2.0 Web Driver, with/without selenium server. Below are the examples of each of these:

Just a bit of history first… Web Driver aims to improve Selenium 1.0 Remote Control. The distinguishing factors being:

  • Object Oriented APIs
  • More features
  • Web Driver uses the APIs exported by the browser for automated testing while Selenium Remote Control injects Javascript to run the test

Web Driver without selenium server

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import time

browser = webdriver.Firefox() # Get local session of firefox
browser.get("http://www.yahoo.com") # Load page
assert "Yahoo!" in browser.title

Web Driver with selenium server – WebDriver Remote

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
driver = webdriver.Remote(
   command_executor='http://127.0.0.1:4444/wd/hub',
   desired_capabilities=DesiredCapabilities.FIREFOX)

driver.get("http://www.python.org")
driver.close()

BDD in Python with lettuce

Behavior Driven Development, also known as BDD, is a concept developed by Dan North and is based on a popular and well adopted TDD. As in Dan’s words –

‘BDD is a second-generation, outside–in, pull-based, multiple-stakeholder, multiple-scale, high-automation, agile methodology. It describes a cycle of interactions with well-defined outputs, resulting in the delivery of working, tested software that matters.’

BDD provides a framework where QA, Business Analysts and other stake-holders communicate and collaborate on sotware development. While TDD emphasized on developing tests for unit piece of code. BDD insists on developing tests for business scenarios or use cases or behavioral specification of software being developed. According to Dan, BDD tests should be written as user stories ‘As a [role] I want [feature] so that [benefit]’ and Acceptance criteria should be defined as ‘Given [initial context], when [event occurs], then [ensure some outcomes].

lettuce is typically used in Python to implement BDD. This blog covers the installation of lettuce on Ubuntu and its application with an example of fibonacci function

Installation

buntu@ubuntu:~$ sudo pip install lettuce
[sudo] password for buntu:
Downloading/unpacking lettuce
Downloading lettuce-0.2.9.tar.gz (40Kb): 40Kb downloaded
Running setup.py egg_info for package lettuce
Downloading/unpacking sure (from lettuce)
Downloading sure-1.0.6.tar.gz
Running setup.py egg_info for package sure
Downloading/unpacking fuzzywuzzy (from lettuce)
Downloading fuzzywuzzy-0.1.tar.gz
Running setup.py egg_info for package fuzzywuzzy
Installing collected packages: fuzzywuzzy, lettuce, sure
Running setup.py install for lettuce
Installing lettuce script to /usr/local/bin
Running setup.py install for sure
Running setup.py install for fuzzywuzzy
Successfully installed lettuce

Setup

Let’s first create a directory structure that looks like this

buntu@ubuntu:~$ tree lettucetests/
lettucetests/
|– features
|   |– fib.feature
|   |– test.py
`– test.feature

1 directory, 3 files

Define Features

Write Tests

Django setup on Ubuntu

 

Setting up a django website calls for (though not always):

  • django installation
  • configuring Apache
  • mod_wsgi
  • others like database servers, static file server etc

Now if you are developing a small scale website, you may not want to go the Apache, mod_wsgi way.. Django helps here by providing a development web server, so that you can get your website up and running rapidly.

This blog talks about setting django website on Ubuntu 10.04:

Step1: Get python-pip

buntu@ubuntu:~$ sudo apt-get install python-pip 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libtext-glob-perl libcompress-bzip2-perl libparams-util-perl libfile-chmod-perl libdata-compare-perl libfile-pushd-perl libfile-which-perl
  libcpan-inject-perl libfile-find-rule-perl libcpan-checksums-perl libnumber-compare-perl
Use 'apt-get autoremove' to remove them.
The following extra packages will be installed:
  python-setuptools
The following NEW packages will be installed:
  python-pip python-setuptools
0 upgraded, 2 newly installed, 0 to remove and 186 not upgraded.
Need to get 262kB of archives.
After this operation, 1,192kB of additional disk space will be used.
Do you want to continue [Y/n]? 
Get:1 http://us.archive.ubuntu.com/ubuntu/ lucid/main python-setuptools 0.6.10-4ubuntu1 [213kB]
Get:2 http://us.archive.ubuntu.com/ubuntu/ lucid-updates/universe python-pip 0.3.1-1ubuntu2.1 [49.8kB]
Fetched 262kB in 5s (50.6kB/s)      
Selecting previously deselected package python-setuptools.
(Reading database ... 124327 files and directories currently installed.)
Unpacking python-setuptools (from .../python-setuptools_0.6.10-4ubuntu1_all.deb) ...
Selecting previously deselected package python-pip.
Unpacking python-pip (from .../python-pip_0.3.1-1ubuntu2.1_all.deb) ...
Processing triggers for man-db ...
Setting up python-setuptools (0.6.10-4ubuntu1) ...

Processing triggers for python-central ...
Setting up python-pip (0.3.1-1ubuntu2.1) ...

Step 2: Install django

buntu@ubuntu:~$ sudo pip install django
Downloading/unpacking django
  Downloading Django-1.4.1.tar.gz (7.7Mb): 7.7Mb downloaded
  Running setup.py egg_info for package django
Installing collected packages: django
  Running setup.py install for django
    changing mode of build/scripts-2.6/django-admin.py from 644 to 755
    changing mode of /usr/local/bin/django-admin.py to 755
Successfully installed django

Step 3: Check for django installation

buntu@ubuntu:~$ python -c "import django; print(django.get_version())"
1.4.1

Step 4: Create a project site

buntu@ubuntu:~$ django-admin.py startproject mysite
buntu@ubuntu:~$ tree mysite
mysite
|-- manage.py
`-- mysite
    |-- __init__.py
    |-- settings.py
    |-- urls.py
    `-- wsgi.py

1 directory, 5 files

Step 5: Start django development server

buntu@ubuntu:~$ cd mysite
buntu@ubuntu:~/mysite$ python manage.py runserver
Validating models...

0 errors found
Django version 1.4.1, using settings 'mysite.settings'
Development server is running at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

Step 6: Browse to http://127.0.0.1:8000/ home page

[09/Oct/2012 01:42:52] "GET / HTTP/1.1" 200 1957

 

 

Abstraction in Search

Another great event I attended this year.. PyCon India 2012, was better organized, had better talks, more audiences, job fair and more fun than ever before.. 🙂 Not to forget the evening dinner for speakers 😉 I loved every bit of it…. talking to experts, talking to Python enthusiasts, answering their Qs and wondering why I was not like the younger folks when I was younger? 😛

Vishal and I delivered a talk on ‘Rapid development of website search in Python’.. We spoke about,

  • Why Search is imperative in web sites
  • How is the Schema defined and Analyzers chosen
  • How indexing, and searching works with appropriate flowcharts
  • How search can be easily integrated with your web application
  • What are the design and development considerations for implementing it

We also shared our observations on facets of a good search solution, It should be:

  • Integral to the website development
  • Decoupled from the web framework used for website  development
  • Adaptable (scale and requirements of website)
  • And most importantly it should be rapidly developed and deployed

This talk provoked a new design concept of Abstraction in Search (never tried before as we know of) and contributed to the Python community at large…

Preface

We all understand no same solution fit for two different problems. The same phenomena applies for search engines as well.. A search engine may have high indexing, committing capabilities but slower searching algorithm when compared to an equivalently feature rich engine. Hence a search engine is deemed to be the best solution for one website but maybe an utter unfit for other…

Problem

Now developing search with one particular algorithm or a particular engine, and plugging it into any website that you develop is no less than digging your own grave! Why the h**l would you assume that the one search solution that you’ve develop for your large scale website is suitable for other small or medium scaled or sized websites?

Solution

We propose development of customized search engines that are adaptable to the small/medium and large scaled & sized websites. Once you have the search engine implementations, develop an Abstraction Layer over these engines. Abstraction Layer would ensure:

  • Freedom to choose an engine based on applicability and adaptability to the website
  • Develop once and reuse as many times
  • Call to a search engine can be decided at run time

The abstraction layer could be implemented in a well know facade pattern way!

Design

We propose a simple to understand SVC model (based on MVC model). SVC stands for Search View Controller. In SVC, the Controller, calls search.py with appropriate search engine to find the search results for user input keywords. search.py is an abstraction developed on the search engines implementations that can adapt to small, mid and large scaled & sized websites. The decision to call a search solution from search.py abstraction is dependent on the website developer (as s/he understands the requirements of website and the search solution for it). Selected search engine then generates the search results for input query terms and passes onto the controller via search.py. Controller then applies the search results to the View (templates) and renders the results to the user..

Prototype Implementation

We’ve developed a prototype for the idea discussed above (termed as fsMgr). fsMgr assumes that the webpages that need to be search are already available (or scrapped) in a tree structure.

search.py of fsMgr abstracts Whoosh and pyLucene search engines. By doing this, we demonstrate, how either of these engines can be leveraged for website search based on the website requirements.

We use Tornado Web Server of Python as Controller that provides us request handling capabilities so that we can export simple search and advanced search capabilities (such as highlighted search, didyoumean spell-checker and morelikethis document searcher) to the users.

Tornado’s template capabilities are used as Views in this prototype.

Code

Source code of this prototype implementation at fsMgr

SVC Architecture

Tornado – Redis

As defined on Redis website, it is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists among others. It’s similar to say memcached library in the sense that it ia an in memory key/value pair but persistent on disk.. Redis has gained importance as a NoSQL option for web development because of its speed (GET and SET operations in the range of 100,000 per seconds). This post is about including Redis in Tornado for web development.

Let’s start with installing Redis-server

ubuntu@ubuntu:~/tornado-2.2$ sudo apt-get install redis-server
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libtext-glob-perl libcompress-bzip2-perl libparams-util-perl libfile-chmod-perl libdata-compare-perl libfile-pushd-perl libfile-which-perl libcpan-inject-perl
  libfile-find-rule-perl libcpan-checksums-perl libnumber-compare-perl
Use 'apt-get autoremove' to remove them.
The following NEW packages will be installed:
  redis-server
0 upgraded, 1 newly installed, 0 to remove and 171 not upgraded.
Need to get 80.8kB of archives.
After this operation, 283kB of additional disk space will be used.
Get:1 http://us.archive.ubuntu.com/ubuntu/ lucid/universe redis-server 2:1.2.0-1 [80.8kB]
Fetched 80.8kB in 2s (27.3kB/s) 
Selecting previously deselected package redis-server.
(Reading database ... 138423 files and directories currently installed.)
Unpacking redis-server (from .../redis-server_2%3a1.2.0-1_i386.deb) ...
Processing triggers for man-db ...
Processing triggers for ureadahead ...
Setting up redis-server (2:1.2.0-1) ...
Starting redis-server: redis-server.

Confirmation

ubuntu@ubuntu:~/tornado-2.2$ ps aux | grep redis
redis    19104  0.0  0.1   2284   716 ?        Ss   21:23   0:00 /usr/bin/redis-server /etc/redis/redis.conf

Python client library for redis

ubuntu@ubuntu:~/tornado-2.2$ sudo pip install redis
Downloading/unpacking redis
  Downloading redis-2.6.2.tar.gz
  Running setup.py egg_info for package redis
Installing collected packages: redis
  Running setup.py install for redis
Successfully installed redis

 

With redis installed, lets go to an example where Redis meets Tornado. In the example below,

  • When the web server is started, redis server is initialized with key-value pairs of username and password (password is md5 hash of username in hex format) for users ‘bob’ and ‘clara’.
  • On browsing to http://localhost:8888/login and POSTing the username and password details to Tornado web server, authentication of details happen from redis server.
  • Relevant message for successful/unsuccessful attempt is render on user’s browser