Title
The past end issue for `lazy_split_view`
Status
new
Section
[range.lazy.split.outer]
Submitter
Hewill Kang

Created on 2025-04-26.00:00:00 last changed 7 days ago

Messages

Date: 2025-10-21.15:17:02

Proposed resolution:

This wording is relative to N5014.

  1. Modify [range.lazy.split.outer] as indicated:

    constexpr outer-iterator& operator++();
    

    -6- Effects: Equivalent to:

    const auto end = ranges::end(parent_->base_);
    if (current == end) {
      trailing_empty_ = false;
      return *this;
    }
    const auto [pbegin, pend] = subrange{parent_->pattern_};
    if (pbegin == pend) ++current;
    else if constexpr (tiny-range<Pattern>) {
      current = ranges::find(std::move(current), end, *pbegin);
      if (current != end) {
        ++current;
        if (current == end)
          trailing_empty_ = true;
        else if constexpr (!forward_range<V>)
          trailing_empty_ = true;
      }
    }
    else {
      do {
        auto [b, p] = ranges::mismatch(current, end, pbegin, pend);
        if (p == pend) {
          current = b;
          if (current == end)
            trailing_empty_ = true;
          break;            // The pattern matched; skip it
        }
      } while (++current != end);
    }
    return *this;
    
Date: 2025-10-15.00:00:00

[ 2025-10-21; Hewill Kang provides simpler wording ]

Date: 2025-10-15.00:00:00

[ 2025-10-21; Reflector poll. ]

Set priority to 2 after reflector poll.

"This is unfortunate. `lazy_split` is probably not very commonly used, but handling input ranges was half the reason why we kept it around. Can we reuse `trailing_empty_` for this instead of adding a new flag? Both flags have the same meaning, and the two cases where they are true are disjoint: we need `has_next_` when iterating through the inner range exhausted the source range; we need `trailing_empty_` when we find a delimiter at the end of the source range when incrementing the outer iterator, which by definition means that iterating through the inner range didn't exhaust it."

This wording is relative to N5008.

  1. Modify [range.lazy.split.outer] as indicated:

    namespace std::ranges {
      template<input_range V, forward_range Pattern>
        requires view<V> && view<Pattern> &&
                 indirectly_comparable<iterator_t<V>, iterator_t<Pattern>, ranges::equal_to> &&
                 (forward_range<V> || tiny-range<Pattern>)
      template<bool Const>
      struct lazy_split_view<V, Pattern>::outer-iterator {
      private:
        using Parent = maybe-const<Const, lazy_split_view>;     // exposition only
        using Base = maybe-const<Const, V>;                     // exposition only
        Parent* parent_ = nullptr;                              // exposition only
    
        iterator_t<Base> current_ = iterator_t<Base>();         // exposition only, present only
                                                                // if V models forward_range
    
        bool trailing_empty_ = false;                           // exposition only
        bool has_next_ = false;                                 // exposition only, present only
                                                                // if forward_range<V> is false
      public:
        […]
      };
    }
    
    […]
    constexpr explicit outer-iterator(Parent& parent)
      requires (!forward_range<Base>);
    

    -2- Effects: Initializes parent_ with `addressof(parent)` and has_next_ with current != ranges::end(parent_->base_).

    […]
    constexpr outer-iterator& operator++();
    

    -6- Effects: Equivalent to:

    const auto end = ranges::end(parent_->base_);
    if (current == end) {
      trailing_empty_ = false;
      if constexpr (!forward_range<V>)
        has_next_ = false;
      return *this;
    }
    const auto [pbegin, pend] = subrange{parent_->pattern_};
    if (pbegin == pend) ++current;
    else if constexpr (tiny-range<Pattern>) {
      current = ranges::find(std::move(current), end, *pbegin);
      if (current != end) {
        ++current;
        if (current == end)
          trailing_empty_ = true;
      }
    }
    else {
      do {
        auto [b, p] = ranges::mismatch(current, end, pbegin, pend);
        if (p == pend) {
          current = b;
          if (current == end)
            trailing_empty_ = true;
          break;            // The pattern matched; skip it
        }
      } while (++current != end);
    }
    if constexpr (!forward_range<V>)
      if (current == end)
        has_next_ = false;
    return *this;
    
    […]
    friend constexpr bool operator==(const outer-iterator& x, default_sentinel_t);
    

    -8- Effects: Equivalent to:

    if constexpr (!forward_range<V>)
      return !x.has_next_ && !x.trailing_empty_;
    else
      return x.current == ranges::end(x.parent_->base_) && !x.trailing_empty_;
    
Date: 2025-04-26.00:00:00

Consider (demo):

#include <print>
#include <ranges>
#include <sstream>

int main() {
  std::istringstream is{"1 0 2 0 3"};
  auto r = std::views::istream<int>(is)
         | std::views::lazy_split(0)
         | std::views::stride(2);
  std::println("{}", r); // should print [[1], [3]]
}

The above leads to SIGSEGV in libstdc++, the reason is that we are iterating over the nested range as:

for (auto&& inner : r) {
  for (auto&& elem : inner) {
    // […]
  }
}

which is disassembled as:

auto outer_it = r.begin();
std::default_sentinel_t out_end = r.end();
for(; outer_it != out_end; ++outer_it) {
  auto&& inner_r = *outer_it;
  auto inner_it = inner_r.begin();
  std::default_sentinel_t inner_end = inner_r.end();
  for(; inner_it != inner_end; ++inner_it) {
    auto&& elem = *inner_it;
    // […]
  }
}

Since `inner_it` and `output_it` actually update the same iterator, when we back to the outer loop, lazy_split_view::outer-iterator is now equal to `default_sentinel`, which makes `output_it` reach the end, so `++outer_it` will increment the iterator past end, triggering the assertion.

Note that this also happens in MSVC-STL when `_ITERATOR_DEBUG_LEVEL` is turned on.

It seems that extra flags are needed to fix this issue because `output_it` should not be considered to reach the end when we back to the outer loop.

History
Date User Action Args
2025-10-21 15:17:02adminsetmessages: + msg15330
2025-10-21 15:17:02adminsetmessages: + msg15327
2025-04-27 16:09:32adminsetmessages: + msg14735
2025-04-26 00:00:00admincreate