Created on 2025-09-16.00:00:00 last changed 3 weeks ago
Proposed resolution:
Modify [basic.string.general] as indicated:
-3- In all cases, [`data()`, `data() + size()`] is a valid range, `data() + size()` points at an object with value `charT()` (a "null terminator"), and size() <= capacity() is `true`. Non-const access to the null terminator is possible, e.g. using `*(data()+size())`, but the program has undefined behavior if the null terminator is modified to any value other than `charT()`.
Modify [string.access] as indicated:
constexpr const_reference operator[](size_type pos) const; constexpr reference operator[](size_type pos);-1- Hardened Preconditions: pos <= size() is `true`.
-2- Returns: `*(data() + pos)`.
`*(begin() + pos)` if pos < size(). Otherwise, returns a reference to an object of type `charT` with value `charT()`, where modifying the object to any value other than `charT()` leads to undefined behavior.-3- Throws: Nothing.
-4- Complexity: Constant time.
Modify [string.accessors] as indicated:
constexpr const charT* c_str() const noexcept; constexpr const charT* data() const noexcept; constexpr charT* data() noexcept;-1- Returns: `to_address(begin())`.
A pointer `p` such that `p + i == addressof(operator[](i))` for each `i` in [`0`, `size()`].-2- Complexity: Constant time.
-3- Remarks: The program shall not modify any of the values stored in the character array; otherwise, the behavior is undefined.constexpr charT* data() noexcept;
-4- Returns: A pointer `p` such that `p + i == addressof(operator[](i))` for each `i` in [`0`, `size()`].
-5- Complexity: Constant time.
-6- Remarks: The program shall not modify the value stored at `p + size()` to any value other than `charT()`; otherwise, the behavior is undefined.
[ 2025-11-11; Jonathan provides new wording ]
We say that `basic_string` is a contiguous container, which makes the `addressof` wording in `c_str()` and `data()` redundant. The front matter says that there's a null terminator present, so we can move the rule about not modifying the terminator there instead of repeating it in `operator[]` and `c_str()`.
We can also permit modifying the string contents through
const_cast<char*>(str.c_str())[0].
There's no reason for that to be undefined when
const_cast<string&>(str)[0] and
const_cast<string&>(str).data()[0]
are both allowed.
The only restriction should be on changing the null terminator.
Changing any other characters through `c_str() const` or `data() const`
is no different to changing them through the non-const `data()`,
and does not need to cause undefined behaviour.
[ 2025-10-21; Reflector poll. ]
Set priority to 4 after reflector poll.
"NAD. `begin() + size()` is not dereferenceable and should remain that way."
"Saying "if pos <= size() is redundant given the precondition above."
"The resolution removes any guarantee that the value at `str[str.size()]` is `charT()`. Furthermore, the premise of the issue is incorrect, returning the address of a different null terminator not belonging to the string would make traversing it with other string operations UB, so it has to return a reference to a terminator that's within the same array."
"`*(begin() = size())` is UB, but could use `*(data() + size())` instead. Personally I'd like `*end()` to be valid, but that's certainly LEWG business requiring a paper."
This wording is relative to N5014.
Modify [string.access] as indicated:
constexpr const_reference operator[](size_type pos) const; constexpr reference operator[](size_type pos);-1- Hardened preconditions: pos <= size() is `true`.
-2- Returns: `*(begin() + pos)` if pos <= size().Otherwise, returns a reference to an object of type `charT` with value `charT()`, where modifying the object to any value other than `charT()` leads to undefined behavior.-3- Throws: Nothing. -4- Complexity: Constant time. -?- Remarks The program shall not modify the value stored at `size()` to any value other than `charT()`; otherwise, the behavior is undefined
From the working draft N5014, the specification for `operator[]` in [string.access] p2 says:
Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type `charT` with value `charT()`, where modifying the object to any value other than `charT()` leads to undefined behavior.
The specification for data() in [string.accessors] p1 (and p4) says, however:
Returns: A pointer `p` such that `p + i == addressof(operator[](i))` for each `i` in `[0, size()]`.
The former implies that `str[str.size()]` is allowed to be the address of any null terminator, while the latter restricts it to only being the null terminator belonging to the string.
Suggested fix: Change wording around `operator[]` toReturns: `*(begin() + pos)` if pos <= size(). The program shall not modify the value stored at `size()` to any value other than `charT()`; otherwise, the behavior is undefined.
This moves it inline with the `data()` specification. Given the hardened precondition that pos <= size() this does not change behavior for any in-contract access, and we do not define what the feature does when called with broken preconditions. I have been looking at the latter but that will be an EWG paper instead.
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2025-11-11 21:49:25 | admin | set | messages: + msg15735 |
| 2025-10-21 11:45:47 | admin | set | messages: + msg15301 |
| 2025-09-21 05:42:21 | admin | set | messages: + msg15072 |
| 2025-09-16 00:00:00 | admin | create | |