Title
Allow overwriting of std::basic_string terminator with charT() to allow cleaner interoperation with legacy APIs
Status
c++17
Section
[string.access]
Submitter
Matt Weber

Created on 2015-02-21.00:00:00 last changed 89 months ago

Messages

Date: 2016-08-03.12:32:27

Proposed resolution:

This wording is relative to N4296.

  1. Edit [string.access] as indicated:

    const_reference operator[](size_type pos) const;
    reference operator[](size_type pos);
    

    -1- Requires: […]

    -2- Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object to any value other than charT() leads to undefined behavior.

    […]

Date: 2016-08-06.21:12:20

[ 2016-08 Chicago ]

Tues PM: This should also apply to non-const data(). Billy to update wording.

Fri PM: Move to Tentatively Ready

Date: 2015-02-21.00:00:00

It is often desirable to use a std::basic_string object as a buffer when interoperating with libraries that mutate null-terminated arrays of characters. In many cases, these legacy APIs write a null terminator at the specified end of the provided buffer. Providing such a function with an appropriately-sized std::basic_string results in undefined behavior when the charT object at the size() position is overwritten, even if the value remains unchanged.

Absent the ability to allow for this, applications are forced into pessimizations such as: providing appropriately-sized std::vectors of charT for interoperating with the legacy API, and then copying the std::vector to a std::basic_string; providing an oversized std::basic_string object and then calling resize() later.

A trivial example:

#include <string>
#include <vector>

void legacy_function(char *out, size_t count) {
  for (size_t i = 0; i < count; ++i) {
    *out++ = '0' + (i % 10);
  }
  *out = '\0'; // if size() == count, this results in undefined behavior
}

int main() {
  std::string s(10, '\0');
  legacy_function(&s[0], s.size()); // undefined behavior

  std::vector<char> buffer(11);
  legacy_function(&buffer[0], buffer.size() - 1);
  std::string t(&buffer[0], buffer.size() - 1); // potentially expensive copy

  std::string u(11, '\0');
  legacy_function(&u[0], u.size() - 1);
  u.resize(u.size() - 1); // needlessly complicates the program's logic
}

A slight relaxation of the requirement on the returned object from the element access operator would allow for this interaction with no semantic change to existing programs.

History
Date User Action Args
2017-07-30 20:15:43adminsetstatus: wp -> c++17
2016-11-14 03:59:28adminsetstatus: pending -> wp
2016-11-14 03:55:22adminsetstatus: ready -> pending
2016-08-06 21:12:20adminsetstatus: open -> ready
2016-08-03 12:32:27adminsetmessages: + msg8358
2016-08-03 12:32:27adminsetstatus: new -> open
2015-03-31 21:10:25adminsetmessages: + msg7291
2015-02-21 00:00:00admincreate