Title
views::split drops trailing empty range
Status
resolved
Section
[range.split]
Submitter
Barry Revzin

Created on 2020-08-20.00:00:00 last changed 42 months ago

Messages

Date: 2021-06-13.00:00:00

[ 2021-06-13 Resolved by the adoption of P2210R2 at the June 2021 plenary. Status changed: New → Resolved. ]

Date: 2020-09-15.00:00:00

[ 2020-09-02; Reflector prioritization ]

Set priority to 2 as result of reflector discussions.

Date: 2020-08-24.13:28:26

From StackOverflow, the program:

#include <iostream>
#include <string>
#include <ranges>

int main()
{
  std::string s = " text ";
  auto sv = std::ranges::views::split(s, ' ');
  std::cout << std::ranges::distance(sv.begin(), sv.end());
}

prints 2 (as specified), but it really should print 3. If a range has N delimiters in it, splitting should produce N+1 pieces. If the Nth delimiter is the last element in the input range, views::split produces only N pieces — it doesn't emit a trailing empty range.

Going through a bunch of languages gets a sense of what they all do here. There are basically two groups (and Haskell goes in both because it has several different split functions)

  1. Rust, Python, Javascript, Go, Kotlin, Haskell's "splitOn" all provide N+1 parts if there were N delimiters.

  2. APL, D, Elixir, Haskell's "words", Ruby, and Clojure all compress all empty words. Splitting " x " on " " would give ["x"] here, whereas the languages in the above group would give ["", "x", ""]

Java is distinct from both groups in that it is mostly a first category language, except that by default it removes all trailing empty strings (but it keeps all leading and intermediate empty strings, unlike the second category languages) — although it has a parameter that lets you keep the trailing ones too.

C++20's behavior is closest to Java's default, except that it only removes one trailing empty string instead of every trailing empty string — and this behavior is not parameterizeable. But I think the intent is to be squarely in the first category, so I think the current behavior is just a specification error.

Many of these languages also provide an additional extra parameter to limit how many splits happen (e.g. Java, Kotlin, Python, Rust, JavaScript), but that's a separate design question.

History
Date User Action Args
2021-06-14 14:09:26adminsetmessages: + msg11924
2021-06-14 14:09:26adminsetstatus: new -> resolved
2020-09-02 17:46:04adminsetmessages: + msg11469
2020-08-20 00:00:00admincreate