Performance for testing memberships: list vs tuples vs sets

Sets in Python are often used for two purposes:

1. Removing the duplicate entries in a collection

2. For membership testing.

By membership, here we mean to find existence of element in a collection

The focus of this post is to evaluate performance of list, tuple and set data structures with respect to each other on the basis of membership testing.

How do I do it? Well, some code, some data collection and some analysis

[sourcecode language=”python”]
from datetime import datetime
listA = range(10000000)
setA = set(listA)
tupA = tuple(listA)

def calc(data, type):
start = datetime.now()
if data in type:
print ""
end = datetime.now()
print end-start

calc(9999, listA)
calc(9999, tupA)
calc(9999, setA)
[/sourcecode]

Below is the average of 10 iterations for the time taken by lists/tuples and sets for testing membership of 9999, 99999, 999999 in 10000000

Search 9999 in  10000000
list tuple set
0.8 0.8 1.9
Search 99999 in  10000000
list tuple set
2.6 2.8 1.7
Search 999999 in  10000000
list tuple set
21.4 21.6 0.8

Conclusions

1. for testing membership of elements at the beginning of the collection, lists or tuples perform more than 100% better than sets

2. As soon as you start testing membership of elements that are in the middle or end of the collection, sets perform 40% – 1800% better than lists or tuples

You now have a fair idea why you should think of using sets for large collections…

Note

Hardware I used: windows 7, x64, 4GB RAM

Python version: 2.7.2

2 thoughts on “Performance for testing memberships: list vs tuples vs sets

  1. Interesting! could we have a mechanism or an intelligent API where given a list, the membership query function transparently chooses the best construct among set, tuples or list before querying on the same list?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.