Tangentially related, but I like how readable the source code of CPython (the official interpreter for Python) is. It is plain C without many shenanigans. If you wonder how Python handles big integer, hashmaps, and so on, just dive in the source code!
A few days ago, I wanted a concrete example about dynamic arrays. The examples in computer science textbook always uses a factor of 2 when expanding, since it is very convenient to demonstrate that it gives an amortized O(1) insertion time. Some people find it wasteful, and may resort to use a constant expansion (e.g. 128) instead, leading to a O(n / 128), i.e. O(n) insertion time instead.
But you can just make the expansion factor less than 2. I wanted a concrete example, so, naturally, I just went for CPython's implementation of `list`. It took me less than 2 minutes to find it: https://github.com/python/cpython/blob/main/Objects/listobje....
If you're curious, CPython essentially grows its dynamic array by ⅛ (12.5%) whenever it needs to.
By C standards that’s perfectly fine. No crazy macros, just plain old bit-twiddling. The and-not combination is a very traditional pattern for bit masking.
What a fantastic post and explanation, big kudos to the author. I recently went from knowing next to nothing about hashtables, to implementing my own in C using simple separate chaining, and was surprised to find out that it's around twice as slow as an equivalent CPython `dict` benchmark, while also only being around 20% slower than an equivalent Golang `map` benchmark.
For inserting and looking up 10,000,000 integer values with string keys I get 6.4s for both Python and SBCL and 5.2 seconds for C++ (std::string, unordered_map).
Python's dict implementation (in C) is good and the interpreter loop overhead is dwarfed by the dict overhead.
SBCL's implementation is in Lisp and achieves the same speed.
(For the general case of Python speed this benchmark is of course useless, since raw dicts written in C cannot be optimized much. Nevertheless it is sometimes quoted as "evidence" that "Python is almost as fast as C++" ...)
In such an environment my implementation achieves a runtime of around 5.8s while Python3 manages 5.2s.
41% of the runtime of my impl. is spent on resizing (19 times) + finally freeing (once) the hashtable.
It's interesting that in your testing unordered_map managed a faster runtime than Python. I've attributed my implementation's defeat to the fact that it uses separate chaining while CPython uses open addressing (+ a bunch of other clever optimizations...), but seeing as unordered_map also uses separate chaining (as mandated by the C++ standard, apparently) and it managed a win in your testing, I'm interested to see how they do things.
For those who learn better from videos, here's a 37 minute Pycon 2017 talk that covers all of the major aspects of Python dictionaries including how they became ordered and compacted, how object dictionaries share keys, why full hashes are stored, how the probing sequence works, when resizing occurs, etc.
A few days ago, I wanted a concrete example about dynamic arrays. The examples in computer science textbook always uses a factor of 2 when expanding, since it is very convenient to demonstrate that it gives an amortized O(1) insertion time. Some people find it wasteful, and may resort to use a constant expansion (e.g. 128) instead, leading to a O(n / 128), i.e. O(n) insertion time instead.
But you can just make the expansion factor less than 2. I wanted a concrete example, so, naturally, I just went for CPython's implementation of `list`. It took me less than 2 minutes to find it: https://github.com/python/cpython/blob/main/Objects/listobje....
If you're curious, CPython essentially grows its dynamic array by ⅛ (12.5%) whenever it needs to.