Sets in Python are often used for two purposes:
1. Removing the duplicate entries in a collection
2. For membership testing.
By membership, here we mean to find existence of element in a collection
The focus of this post is to evaluate performance of list, tuple and set data structures with respect to each other on the basis of membership testing.
How do I do it? Well, some code, some data collection and some analysis
from datetime import datetime
listA = range(10000000)
setA = set(listA)
tupA = tuple(listA)
def calc(data, type):
start = datetime.now()
if data in type:
end = datetime.now()
Below is the average of 10 iterations for the time taken by lists/tuples and sets for testing membership of 9999, 99999, 999999 in 10000000
|Search 9999 in 10000000|
|Search 99999 in 10000000|
|Search 999999 in 10000000|
1. for testing membership of elements at the beginning of the collection, lists or tuples perform more than 100% better than sets
2. As soon as you start testing membership of elements that are in the middle or end of the collection, sets perform 40% – 1800% better than lists or tuples
You now have a fair idea why you should think of using sets for large collections…
Hardware I used: windows 7, x64, 4GB RAM
Python version: 2.7.2
2 thoughts on “Performance for testing memberships: list vs tuples vs sets”
Interesting! could we have a mechanism or an intelligent API where given a list, the membership query function transparently chooses the best construct among set, tuples or list before querying on the same list?
Nice Work, Thanks !