Module s2.util.container.dense_hash_set

This is just a very thin wrapper over dense_hash_table.d, just like sgi stl's stl_hash_set is a very thin wrapper over stl_hashtable. The major thing we define is operator[], because we have a concept of a data_type which stl_hashtable doesn't \(it only has a key and a value\).

This is more different from dense_hash_map than you might think, because all iterators for sets are const \(you obviously can't change the key, and for sets there is no value\).

NOTE

this is exactly like sparse_hash_set.h, with the word "sparse" replaced by "dense", except for the addition of setEmptyKey().

YOU MUST CALL SET_EMPTY_KEY() IMMEDIATELY AFTER CONSTRUCTION.

Otherwise your program will die in mysterious ways. \(Note if you use the constructor that takes an InputIterator range, you pass in the empty key in the constructor, rather than after. As a result, this constructor differs from the standard STL version.\)

In other respects, we adhere mostly to the STL semantics for hash-map. One important exception is that insert() may invalidate iterators entirely -- STL semantics are that insert() may reorder iterators, but they all still refer to something valid in the hashtable. Not so for us. Likewise, insert() may invalidate pointers into the hashtable. \(Whether insert invalidates iterators and pointers depends on whether it results in a hashtable resize\). On the plus side, delete() doesn't invalidate iterators or pointers at all, or even change the ordering of elements.

Here are a few "power user" tips:

1. set_deleted_key(): If you want to use erase() you must call set_deleted_key(), in addition to set_empty_key(), after construction. The deleted and empty keys must differ.

2. resize(0): When an item is deleted, its memory isn't freed right away. This allows you to iterate over a hashtable, and call erase(), without invalidating the iterator. To force the memory to be freed, call resize(0). For tr1 compatibility, this can also be called as rehash(0).

3. min_load_factor(0.0) Setting the minimum load factor to 0.0 guarantees that the hash table will never shrink.

Roughly speaking: (1) dense_hash_set: fastest, uses the most memory unless entries are small (2) sparse_hash_set: slowest, uses the least memory (3) hash_set / unordered_set (STL): in the middle

Typically I use sparse_hash_set when I care about space and/or when I need to save the hashtable on disk. I use hash_set otherwise. I don't personally use dense_hash_set ever; some people use it for small sets with lots of lookups.

- dense_hash_set has, typically, about 78% memory overhead \(if your data takes up X bytes, the hash_set uses .78X more bytes in overhead\). - sparse_hash_set has about 4 bits overhead per entry. - sparse_hash_set can be 3-7 times slower than the others for lookup and, especially, inserts. See time_hash_map.cc for details.

See /usr/(local/)?doc/sparsehash-*/dense_hash_set.html for information about how to use this class.

Functions

NameDescription
denseHashSet(expected_max_items_in_table, hashFcn, equalKey) Creates a DenseHashSet for a given type using default comparison functions.
equalTo(v1, v2) Helper function for comparing two values.
hash(value) Helper function for computing the hash of a basic type.

Classes

NameDescription
DenseHashSet A set implemented via a hash that very few empty slots.