Issue 291: Underspecification of set algorithms

Title: Underspecification of set algorithms
Status: cd1
Section: [alg.set.operations]
Submitter: Matt Austern

Created on 2001-01-03.00:00:00 last changed 178 months ago

Messages

Date: 2010-10-21.18:28:33

Rationale:

For simple cases, these descriptions are equivalent to what's already in the Standard. For more complicated cases, they describe the behavior of existing implementations.

msg2130 (view)

Date: 2010-10-21.18:28:33

[ Santa Cruz: it's believed that this language is clearer than what's in the Standard. However, it's also believed that the Standard may already make these guarantees (although not quite in these words). Bill and Howard will check and see whether they think that some or all of these changes may be redundant. If so, we may close this issue as NAD. ]

msg2129 (view)

Date: 2010-10-21.18:28:33

Proposed resolution:

Add the following to the end of [set.union] paragraph 5:

If [first1, last1) contains m elements that are equivalent to each other and [first2, last2) contains n elements that are equivalent to them, then max(m, n) of these elements will be copied to the output range: all m of these elements from [first1, last1), and the last max(n-m, 0) of them from [first2, last2), in that order.

Add the following to the end of [set.intersection] paragraph 5:

If [first1, last1) contains m elements that are equivalent to each other and [first2, last2) contains n elements that are equivalent to them, the first min(m, n) of those elements from [first1, last1) are copied to the output range.

Add a new paragraph, Notes, after [set.difference] paragraph 4:

If [first1, last1) contains m elements that are equivalent to each other and [first2, last2) contains n elements that are equivalent to them, the last max(m-n, 0) elements from [first1, last1) are copied to the output range.

Add a new paragraph, Notes, after [set.symmetric.difference] paragraph 4:

If [first1, last1) contains m elements that are equivalent to each other and [first2, last2) contains n elements that are equivalent to them, then |m - n| of those elements will be copied to the output range: the last m - n of these elements from [first1, last1) if m > n, and the last n - m of these elements from [first2, last2) if m < n.

msg2128 (view)

Date: 2001-01-03.00:00:00

The standard library contains four algorithms that compute set operations on sorted ranges: set_union, set_intersection, set_difference, and set_symmetric_difference. Each of these algorithms takes two sorted ranges as inputs, and writes the output of the appropriate set operation to an output range. The elements in the output range are sorted.

The ordinary mathematical definitions are generalized so that they apply to ranges containing multiple copies of a given element. Two elements are considered to be "the same" if, according to an ordering relation provided by the user, neither one is less than the other. So, for example, if one input range contains five copies of an element and another contains three, the output range of set_union will contain five copies, the output range of set_intersection will contain three, the output range of set_difference will contain two, and the output range of set_symmetric_difference will contain two.

Because two elements can be "the same" for the purposes of these set algorithms, without being identical in other respects (consider, for example, strings under case-insensitive comparison), this raises a number of unanswered questions:

If we're copying an element that's present in both of the input ranges, which one do we copy it from?
If there are n copies of an element in the relevant input range, and the output range will contain fewer copies (say m) which ones do we choose? The first m, or the last m, or something else?
Are these operations stable? That is, does a run of equivalent elements appear in the output range in the same order as as it appeared in the input range(s)?

The standard should either answer these questions, or explicitly say that the answers are unspecified. I prefer the former option, since, as far as I know, all existing implementations behave the same way.

History
Date	User	Action	Args
2010-10-21 18:28:33	admin	set	messages: + msg2131
2010-10-21 18:28:33	admin	set	messages: + msg2130
2010-10-21 18:28:33	admin	set	messages: + msg2129
2001-01-03 00:00:00	admin	create