Title
Inconsistency between `std::basic_string`'s `data()` and `operator[]` specification
Status
new
Section
[string.access]
Submitter
Peter Bindels

Created on 2025-09-16.00:00:00 last changed 3 weeks ago

Messages

Date: 2025-11-12.10:10:53

Proposed resolution:

  1. Modify [basic.string.general] as indicated:

    -3- In all cases, [`data()`, `data() + size()`] is a valid range, `data() + size()` points at an object with value `charT()` (a "null terminator"), and size() <= capacity() is `true`. Non-const access to the null terminator is possible, e.g. using `*(data()+size())`, but the program has undefined behavior if the null terminator is modified to any value other than `charT()`.

  2. Modify [string.access] as indicated:

    constexpr const_reference operator[](size_type pos) const;
    constexpr reference       operator[](size_type pos);
    

    -1- Hardened Preconditions: pos <= size() is `true`.

    -2- Returns: `*(data() + pos)`. `*(begin() + pos)` if pos < size(). Otherwise, returns a reference to an object of type `charT` with value `charT()`, where modifying the object to any value other than `charT()` leads to undefined behavior.

    -3- Throws: Nothing.

    -4- Complexity: Constant time.

  3. Modify [string.accessors] as indicated:

    constexpr const charT* c_str() const noexcept;
    constexpr const charT* data() const noexcept;
    constexpr charT* data() noexcept;
    

    -1- Returns: `to_address(begin())`. A pointer `p` such that `p + i == addressof(operator[](i))` for each `i` in [`0`, `size()`].

    -2- Complexity: Constant time.

    -3- Remarks: The program shall not modify any of the values stored in the character array; otherwise, the behavior is undefined.

    constexpr charT* data() noexcept;
    

    -4- Returns: A pointer `p` such that `p + i == addressof(operator[](i))` for each `i` in [`0`, `size()`].

    -5- Complexity: Constant time.

    -6- Remarks: The program shall not modify the value stored at `p + size()` to any value other than `charT()`; otherwise, the behavior is undefined.

Date: 2025-11-15.00:00:00

[ 2025-11-11; Jonathan provides new wording ]

We say that `basic_string` is a contiguous container, which makes the `addressof` wording in `c_str()` and `data()` redundant. The front matter says that there's a null terminator present, so we can move the rule about not modifying the terminator there instead of repeating it in `operator[]` and `c_str()`.

We can also permit modifying the string contents through const_cast<char*>(str.c_str())[0]. There's no reason for that to be undefined when const_cast<string&>(str)[0] and const_cast<string&>(str).data()[0] are both allowed. The only restriction should be on changing the null terminator. Changing any other characters through `c_str() const` or `data() const` is no different to changing them through the non-const `data()`, and does not need to cause undefined behaviour.

Date: 2025-10-15.00:00:00

[ 2025-10-21; Reflector poll. ]

Set priority to 4 after reflector poll.

"NAD. `begin() + size()` is not dereferenceable and should remain that way."

"Saying "if pos <= size() is redundant given the precondition above."

"The resolution removes any guarantee that the value at `str[str.size()]` is `charT()`. Furthermore, the premise of the issue is incorrect, returning the address of a different null terminator not belonging to the string would make traversing it with other string operations UB, so it has to return a reference to a terminator that's within the same array."

"`*(begin() = size())` is UB, but could use `*(data() + size())` instead. Personally I'd like `*end()` to be valid, but that's certainly LEWG business requiring a paper."

This wording is relative to N5014.

  1. Modify [string.access] as indicated:

    constexpr const_reference operator[](size_type pos) const;
    constexpr       reference operator[](size_type pos);
    

    -1- Hardened preconditions: pos <= size() is `true`.

    -2- Returns: `*(begin() + pos)` if pos <= size(). Otherwise, returns a reference to an object of type `charT` with value `charT()`, where modifying the object to any value other than `charT()` leads to undefined behavior.

    -3- Throws: Nothing.

    -4- Complexity: Constant time.

    -?- Remarks The program shall not modify the value stored at `size()` to any value other than `charT()`; otherwise, the behavior is undefined

Date: 2025-09-16.00:00:00

From the working draft N5014, the specification for `operator[]` in [string.access] p2 says:

Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type `charT` with value `charT()`, where modifying the object to any value other than `charT()` leads to undefined behavior.

The specification for data() in [string.accessors] p1 (and p4) says, however:

Returns: A pointer `p` such that `p + i == addressof(operator[](i))` for each `i` in `[0, size()]`.

The former implies that `str[str.size()]` is allowed to be the address of any null terminator, while the latter restricts it to only being the null terminator belonging to the string.

Suggested fix: Change wording around `operator[]` to

Returns: `*(begin() + pos)` if pos <= size(). The program shall not modify the value stored at `size()` to any value other than `charT()`; otherwise, the behavior is undefined.

This moves it inline with the `data()` specification. Given the hardened precondition that pos <= size() this does not change behavior for any in-contract access, and we do not define what the feature does when called with broken preconditions. I have been looking at the latter but that will be an EWG paper instead.

History
Date User Action Args
2025-11-11 21:49:25adminsetmessages: + msg15735
2025-10-21 11:45:47adminsetmessages: + msg15301
2025-09-21 05:42:21adminsetmessages: + msg15072
2025-09-16 00:00:00admincreate