Title
Parallel `ranges::set_intersection` should not do unnecessary work
Status
immediate
Section
[set.intersection]
Submitter
Ruslan Arutyunyan

Created on 2026-03-20.00:00:00 last changed 6 days ago

Messages

Date: 2026-03-24.17:22:19

Proposed resolution:

This wording is relative to N5032.

  1. Modify [set.intersection] as indicated:

    […]
    template<input_iterator I1, sentinel_for<I1> S1, input_iterator I2, sentinel_for<I2> S2,
             weakly_incrementable O, class Comp = ranges::less,
             class Proj1 = identity, class Proj2 = identity>
      requires mergeable<I1, I2, O, Comp, Proj1, Proj2>
      constexpr ranges::set_intersection_result<I1, I2, O>
        ranges::set_intersection(I1 first1, S1 last1, I2 first2, S2 last2, O result,
                                 Comp comp = {}, Proj1 proj1 = {}, Proj2 proj2 = {});
    template<input_range R1, input_range R2, weakly_incrementable O,
             class Comp = ranges::less, class Proj1 = identity, class Proj2 = identity>
      requires mergeable<iterator_t<R1>, iterator_t<R2>, O, Comp, Proj1, Proj2>
      constexpr ranges::set_intersection_result<borrowed_iterator_t<R1>, borrowed_iterator_t<R2>, O>
        ranges::set_intersection(R1&& r1, R2&& r2, O result,
                                 Comp comp = {}, Proj1 proj1 = {}, Proj2 proj2 = {});
    
    template<execution-policy Ep, random_access_iterator I1, sized_sentinel_for<I1> S1,
             random_access_iterator I2, sized_sentinel_for<I2> S2,
             random_access_iterator O, sized_sentinel_for<O> OutS, class Comp = ranges::less,
             class Proj1 = identity, class Proj2 = identity>
      requires mergeable<I1, I2, O, Comp, Proj1, Proj2>
      ranges::set_intersection_result<I1, I2, O>
        ranges::set_intersection(Ep&& exec, I1 first1, S1 last1,
                                 I2 first2, S2 last2, O result, OutS result_last,
                                 Comp comp = {}, Proj1 proj1 = {}, Proj2 proj2 = {});
    template<execution-policy Ep, sized-random-access-range R1, sized-random-access-range R2,
             sized-random-access-range OutR, class Comp = ranges::less,
             class Proj1 = identity, class Proj2 = identity>
      requires mergeable<iterator_t<R1>, iterator_t<R2>, iterator_t<OutR>, Comp, Proj1, Proj2>
      ranges::set_intersection_result<borrowed_iterator_t<R1>, borrowed_iterator_t<R2>,
                                      borrowed_iterator_t<OutR>>
        ranges::set_intersection(Ep&& exec, R1&& r1, R2&& r2, OutR&& result_r, Comp comp = {},
                                 Proj1 proj1 = {}, Proj2 proj2 = {});
    

    -1- Let: […]

    -2- Preconditions: The ranges `[first1, last1)` and `[first2, last2)` are sorted with respect to `comp` and `proj1` or `proj2`, respectively. The resulting range does not overlap with either of the original ranges.

    -3- Effects: Constructs a sorted intersection of the elements from the two ranges; that is, the set of elements that are present in both of the ranges. If `[first1, last1)` contains m elements that are equivalent to each other and `[first2, last2)` contains n elements that are equivalent to them, the first min(m, n) elements from the first range are included in the sorted intersection. If, of those elements, k elements from the first range are copied to the output range, then the first k elements from the second range are considered skipped. If N < M, a A non-copied element is also considered skipped if it compares less than the min(`M`, `N` + 1)th (N + 1)th element of the sorted intersection. Copies the first N elements of the sorted intersection to the range [result, result + N).

    -4- Returns:

    • (4.1) — `result_last` for the overloads in namespace `std`.

    • (4.2) — {last1, last2, result + N} for the non-parallel algorithm overloads in namespace `ranges`, if N is equal to M.

    • (4.3) — Otherwise, {first1 + A, first2 + B,result_last result + N} for the parallel algorithm overloads in namespace `ranges`, where A and B are the numbers of copied or skipped elements in `[first1, last1)` and `[first2, last2)`, respectively.

Date: 2026-03-24.00:00:00

[ Croydon 2026-03-24; move to Immediate ]

Date: 2026-03-24.00:00:00

[ Croydon 2026-03-24; update wording after LWG review ]

Date: 2026-03-24.16:48:49

Serial `ranges::set_intersection` returns `in_in_out_result` and it always goes to the end of both input ranges. However, when either of the input ranges reaches the end, there cannot be intersection anymore, so the algorithm could immediately stop but does not do that.

For example, if the first input is `{1}` and second input is {1, 2, 3, 4, …, 1000000} it's clear that the output will be `{1}` after the first iteration of `ranges::set_intersection`. The first input is already exhausted, so the algorithm could stop and return:

  • `it1` that compared equal with first `last1`,

  • `it2` that points to the element equal to 2 in the second input (in other words, stop point for the second input), and

  • `result_it` that points to past the last written element

Unfortunately, the algorithm does 999999 "unnecessary" operations just to calculate the iterator that compared equal to `last2`, not to compute an intersection itself.

Of course, this case is problematic only for non-random access, non-sized ranges. Still, it looks like an oversight because:

  • Serial overload can work with input iterators,

  • We almost certainly do not want to pessimize the algorithm execution time to just calculate `last1` or `last2`, when the algorithm could potentially stop earlier, and

  • `std::set_intersection` (not the `ranges` one) stops when it hits the end of either input.

For parallel `ranges::set_intersection` we just copied the current behavior of serial `ranges::set_intersection` because parallel overload requires sized-random-access-range, so it's not a problem to calculate both `last1` and `last2` iterators for a constant time in this case. Then we realized that this problem exists for the serial overload.

Fixing C++20 `ranges::set_intersection` is a breaking change, unfortunately. However, we can still change the parallel overload to mimic the serial behavior that we want. This is especially beneficial if and when we add serial range algorithms with the range-as-the-output-design aspect.

The proposed resolution:

  • When the output size is not sufficient to copy all necessary elements from both inputs — no changes to the specified parallel `ranges::set_intersection` behavior.

  • When the output size is sufficient, change the behavior of parallel `ranges::set_intersection` to return `last1` or `last2` — whichever input range is exhausted first — and a stop point in non-exhausted input range, which is the element that is:

    • Past the last element of non-exhausted range, if it compared equal with the last written element to the output, or otherwise

    • The last element that does not compare equal to exhausted-range-last - 1 element.

The trade-off of the proposed behavior is that the overload with execution policy would return something different at runtime compared to the serial overload. However, if we don't apply the proposed resolution, we will lock ourselves into suboptimal `ranges::set_intersection` implementation by design, even for the future algorithms extension.

For the behavior illustration purposes please look at the serial implementation of serial `ranges::set_intersection` with range-as-the-output:

struct set_intersection_fn
{
    template<std::input_iterator I1, std::sentinel_for<I1> S1,
             std::input_iterator I2, std::sentinel_for<I2> S2,
             std::input_iterator O, std::sentinel_for<O> OutS,
             class Comp = std::ranges::less,
             class Proj1 = std::identity, class Proj2 = std::identity>
    requires std::mergeable<I1, I2, O, Comp, Proj1, Proj2>
    constexpr std::ranges::set_intersection_result<I1, I2, O>
      operator()(I1 first1, S1 last1, I2 first2, S2 last2,
                 O result, OutS result_last, Comp comp = {},
                 Proj1 proj1 = {}, Proj2 proj2 = {}) const
      {
        while (first1 != last1 && first2 != last2)
        {
          if (std::invoke(comp, std::invoke(proj1, *first1),
                                std::invoke(proj2, *first2)))
            ++first1;
          else if (std::invoke(comp, std::invoke(proj2, *first2),
                                     std::invoke(proj1, *first1)))
            ++first2;
          else
          {
            if (result == result_last)
              break;

            *result = *first1;
            ++first1;
            ++first2;
            ++result;
          }
        }
        return {std::move(first1), std::move(first2), std::move(result)};
      }

    template<std::ranges::input_range R1, std::ranges::input_range R2,
             std::ranges::input_range OutR, class Comp = std::ranges::less,
             class Proj1 = std::identity, class Proj2 = std::identity>
    requires std::mergeable<std::ranges::iterator_t<R1>, std::ranges::iterator_t<R2>,
                            std::ranges::iterator_t<OutR>, Comp, Proj1, Proj2>
    constexpr std::ranges::set_intersection_result<std::ranges::borrowed_iterator_t<R1>,
                                                   std::ranges::borrowed_iterator_t<R2>,
                                                   std::ranges::borrowed_iterator_t<OutR>>
      operator()(R1&& r1, R2&& r2, OutR&& result, Comp comp = {},
                 Proj1 proj1 = {}, Proj2 proj2 = {}) const
      {
        return (*this)(std::ranges::begin(r1), std::ranges::end(r1),
                       std::ranges::begin(r2), std::ranges::end(r2),
                       std::ranges::begin(result), std::ranges::end(result),
                       std::move(comp), std::move(proj1), std::move(proj2));
      }
};

inline constexpr set_intersection_fn set_intersection {};

For instance, if the first input is `std::vector v1{2, 4, 5}` the second input is `std::vector v2{1, 2, 3, 4, 5, 6, 7}`, and the output is std::vector<int> output(3) then for the call

auto [in1, in2, out] = std::ranges::set_difference(v1, v2, output);
  • `output` is `{2, 4, 5}`

  • `in1` points to `v1.end()`

  • `in2` points to `6` in `v2`

  • `out` points to `output.end()`

Another example: if the first input is `std::vector v1{1, 2, 3, 4}` the second input is `std::vector v2{0, 1, 2, 3, 5, 7}`, and the output is std::vector<int> output(3) then for the call

auto [in1, in2, out] = std::ranges::set_difference(v1, v2, output);
  • `output` is `{1, 2, 3}`

  • `in1` points to `v1.end()`

  • `in2` points to `5` in `v2`

  • `out` points to `output.end()`

This wording is relative to N5032.

  1. Modify [set.intersection] as indicated:

    […]
    template<input_iterator I1, sentinel_for<I1> S1, input_iterator I2, sentinel_for<I2> S2,
             weakly_incrementable O, class Comp = ranges::less,
             class Proj1 = identity, class Proj2 = identity>
      requires mergeable<I1, I2, O, Comp, Proj1, Proj2>
      constexpr ranges::set_intersection_result<I1, I2, O>
        ranges::set_intersection(I1 first1, S1 last1, I2 first2, S2 last2, O result,
                                 Comp comp = {}, Proj1 proj1 = {}, Proj2 proj2 = {});
    template<input_range R1, input_range R2, weakly_incrementable O,
             class Comp = ranges::less, class Proj1 = identity, class Proj2 = identity>
      requires mergeable<iterator_t<R1>, iterator_t<R2>, O, Comp, Proj1, Proj2>
      constexpr ranges::set_intersection_result<borrowed_iterator_t<R1>, borrowed_iterator_t<R2>, O>
        ranges::set_intersection(R1&& r1, R2&& r2, O result,
                                 Comp comp = {}, Proj1 proj1 = {}, Proj2 proj2 = {});
                                 
    template<execution-policy Ep, random_access_iterator I1, sized_sentinel_for<I1> S1,
             random_access_iterator I2, sized_sentinel_for<I2> S2,
             random_access_iterator O, sized_sentinel_for<O> OutS, class Comp = ranges::less,
             class Proj1 = identity, class Proj2 = identity>
      requires mergeable<I1, I2, O, Comp, Proj1, Proj2>
      ranges::set_intersection_result<I1, I2, O>
        ranges::set_intersection(Ep&& exec, I1 first1, S1 last1,
                                 I2 first2, S2 last2, O result, OutS result_last,
                                 Comp comp = {}, Proj1 proj1 = {}, Proj2 proj2 = {});
    template<execution-policy Ep, sized-random-access-range R1, sized-random-access-range R2,
             sized-random-access-range OutR, class Comp = ranges::less,
             class Proj1 = identity, class Proj2 = identity>
      requires mergeable<iterator_t<R1>, iterator_t<R2>, iterator_t<OutR>, Comp, Proj1, Proj2>
      ranges::set_intersection_result<borrowed_iterator_t<R1>, borrowed_iterator_t<R2>,
                                      borrowed_iterator_t<OutR>>
        ranges::set_intersection(Ep&& exec, R1&& r1, R2&& r2, OutR&& result_r, Comp comp = {},
                                 Proj1 proj1 = {}, Proj2 proj2 = {});
    

    -1- Let: […]

    -2- Preconditions: The ranges `[first1, last1)` and `[first2, last2)` are sorted with respect to `comp` and `proj1` or `proj2`, respectively. The resulting range does not overlap with either of the original ranges.

    -3- Effects: Constructs a sorted intersection of the elements from the two ranges; that is, the set of elements that are present in both of the ranges. If `[first1, last1)` contains m elements that are equivalent to each other and `[first2, last2)` contains n elements that are equivalent to them, the first min(m, n) elements from the first range are included in the sorted intersection. If, of those elements, k elements from the first range are copied to the output range, then the first k elements from the second range are considered skipped. If N < M, a non-copied element is also considered skipped if it compares less than the (N + 1)th element of the sorted intersection. Copies the first N elements of the sorted intersection to the range [result, result + N).

    -4- Returns:

    • (4.1) — `result_last` for the overloads in namespace `std`.

    • (4.2) — {last1, last2, result + N} for the non-parallel algorithm overloads in namespace `ranges`, if N is equal to M.

    • (4.?) — {first1 + A, first2 + B, result + N} for the parallel algorithm overloads in namespace `ranges`, if N is equal to M and where A and B are the numbers of copied or skipped elements in `[first1, last1)` and `[first2, last2)`, respectively.

    • (4.3) — Otherwise, {first1 + A, first2 + B, result_last} for the overloads in namespace `ranges`, where A and B are the numbers of copied or skipped elements in `[first1, last1)` and `[first2, last2)`, respectively.

History
Date User Action Args
2026-03-24 17:22:19adminsetmessages: + msg16059
2026-03-24 17:22:19adminsetstatus: new -> immediate
2026-03-24 16:48:49adminsetmessages: + msg16057
2026-03-21 16:32:41adminsetmessages: + msg16038
2026-03-20 00:00:00admincreate