Issue 1188: Unordered containers should have a minimum load factor as well as a maximum

Title: Unordered containers should have a minimum load factor as well as a maximum
Status: nad
Section: [unord.req][unord]
Submitter: Matt Austern

Created on 2009-08-10.00:00:00 last changed 98 months ago

Messages

Date: 2018-06-22.06:38:21

Proposed resolution:

Add two new rows, and change rehash's postcondition in the unordered associative container requirements table in [unord.req]:

Table 87 — Unordered associative container requirements (in addition to container)
Expression Return type Assertion/note pre-/post-condition Complexity

a.min_load_factor() float Returns a non-negative number that the container attempts to keep the load factor greater than or equal to. The container automatically decreases the number of buckets as necessary to keep the load factor above this number. constant

a.min_load_factor(z) void Pre: z shall be non-negative. Changes the container's minimum load factor, using z as a hint. [Footnote: the minimum load factor should be significantly smaller than the maximum. If z is too large, the implementation may reduce it to a more sensible value.] constant

a.rehash(n) void Post: a.bucket_count() >= n, and a.size() <= a.bucket_count() * a.max_load_factor(). [Footnote: It is intentional that the postcondition does not mention the minimum load factor. This member function is primarily intended for cases where the user knows that the container's size will increase soon, in which case the container's load factor will temporarily fall below a.min_load_factor().] ~~a.bucket_cout > a.size() / a.max_load_factor() and a.bucket_count() >= n.~~ Average case linear in a.size(), worst case quadratic.

Table 87 — Unordered associative container requirements (in addition to container)
Expression	Return type	Assertion/note pre-/post-condition	Complexity
`a.min_load_factor()`	`float`	Returns a non-negative number that the container attempts to keep the load factor greater than or equal to. The container automatically decreases the number of buckets as necessary to keep the load factor above this number.	constant
`a.min_load_factor(z)`	`void`	Pre: `z` shall be non-negative. Changes the container's minimum load factor, using `z` as a hint. [Footnote: the minimum load factor should be significantly smaller than the maximum. If `z` is too large, the implementation may reduce it to a more sensible value.]	constant
`a.rehash(n)`	`void`	Post: `a.bucket_count() >= n`, and `a.size() <= a.bucket_count() * a.max_load_factor()`. [Footnote: It is intentional that the postcondition does not mention the minimum load factor. This member function is primarily intended for cases where the user knows that the container's size will increase soon, in which case the container's load factor will temporarily fall below `a.min_load_factor()`.] ~~`a.bucket_cout > a.size() / a.max_load_factor()` and `a.bucket_count() >= n`.~~	Average case linear in `a.size()`, worst case quadratic.

Add a footnote to [unord.req] p12:

The insert members shall not affect the validity of references to container elements, but may invalidate all iterators to the container. The erase members shall invalidate only iterators and references to the erased elements.

[A consequence of these requirements is that while insert may change the number of buckets, erase may not. The number of buckets may be reduced on calls to insert or rehash.]

Change paragraph 13:

The insert members shall not affect the validity of iterators if ~~(N+n) < z * B~~ zmin * B <= (N+n) <= zmax * B, where N is the number of elements in the container prior to the insert operation, n is the number of elements inserted, B is the container's bucket count, zmin is the container's minimum load factor, and zmax is the container's maximum load factor.

Add to the unordered_map class synopsis in section [unord.map], the unordered_multimap class synopsis in [unord.multimap], the unordered_set class synopsis in [unord.set], and the unordered_multiset class synopsis in [unord.multiset]:


float min_load_factor() const;
void min_load_factor(float z);

In [unord.map.cnstr], [unord.multimap.cnstr], [unord.set.cnstr], and [unord.multiset.cnstr], change:

... max_load_factor() returns 1.0 and min_load_factor() returns 0.

msg5442 (view)

Date: 2018-06-22.06:38:21

[ LEWG Kona 2017 ]

Should there be a shrink_to_fit()? Is it too surprising to shrink on insert()? (We understand that shrinking on erase() is not an option.) Maybe make people call rehash(0) to shrink to the min_load_factor? On clear(), the load factor goes to 0 or undefined (0/0), which is likely to violate min_load_factor() min_load_factor(z)'s wording should match max_load_factor(z)'s, e.g. "May change the container’s maximum load factor" Want a paper exploring whether shrink-on-insert has been surprising. From Titus: Google's experience is that maps don't shrink in the way this would help with. NAD, not worth the time. Write a paper if you can demonstrate a need for this.

msg4750 (view)

Date: 2010-11-24.14:28:04

[ Moved to NAD Future at 2010-11 Batavia ]

msg1068 (view)

Date: 2010-10-21.19:00:35

[ 2010 Rapperswil: ]

This seems to a useful extension, but is too late for 0x. Move to Tentatively NAD Future.

msg1067 (view)

Date: 2009-08-10.00:00:00

Unordered associative containers have a notion of a maximum load factor: when the number of elements grows large enough, the containers automatically perform a rehash so that the number of elements per bucket stays below a user-specified bound. This ensures that the hash table's performance characteristics don't change dramatically as the size increases.

For similar reasons, Google has found it useful to specify a minimum load factor: when the number of elements shrinks by a large enough, the containers automatically perform a rehash so that the number of elements per bucket stays above a user-specified bound. This is useful for two reasons. First, it prevents wasting a lot of memory when an unordered associative container grows temporarily. Second, it prevents amortized iteration time from being arbitrarily large; consider the case of a hash table with a billion buckets and only one element. (This was discussed even before TR1 was published; it was TR issue 6.13, which the LWG closed as NAD on the grounds that it was a known design feature. However, the LWG did not consider the approach of a minimum load factor.)

The only interesting question is when shrinking is allowed. In principle the cleanest solution would be shrinking on erase, just as we grow on insert. However, that would be a usability problem; it would break a number of common idioms involving erase. Instead, Google's hash tables only shrink on insert and rehash.

The proposed resolution allows, but does not require, shrinking in rehash, mostly because a postcondition for rehash that involves the minimum load factor would be fairly complicated. (It would probably have to involve a number of special cases and it would probably have to mention yet another parameter, a minimum bucket count.)

The current behavior is equivalent to a minimum load factor of 0. If we specify that 0 is the default, this change will have no impact on backward compatibility.

History
Date	User	Action	Args
2018-06-22 06:38:21	admin	set	messages: + msg9959
2018-06-22 06:38:21	admin	set	status: lewg -> nad
2014-11-24 15:11:58	admin	set	status: nad future -> lewg
2010-11-24 14:28:04	admin	set	messages: + msg5442
2010-10-21 19:00:35	admin	set	messages: + msg4750
2010-10-21 19:00:35	admin	set	status: new -> nad future
2010-10-21 18:28:33	admin	set	messages: + msg1068
2009-08-10 00:00:00	admin	create