Title
wstring_convert provides no indication of incomplete input or output
Status
nad
Section
[depr.conversions.string]
Submitter
PowerGamer

Created on 2017-01-08.00:00:00 last changed 83 months ago

Messages

Date: 2017-06-02.00:00:00

[ 2017-06-02 Issues Telecon ]

This facility has a number of known problems, including poor error handling. The feature has been deprecated, and the plan is to replace it with better facilities with a better API.

Resolve as NAD

Date: 2017-06-05.15:41:21

[ 2017-02 in Kona, LEWG recommends NAD ]

Date: 2017-01-27.00:00:00

[ 2017-01-27 Telecon ]

Priority 3; send to LEWG

Date: 2017-01-08.00:00:00

Example:

// Input UTF-16 string is incomplete - only first half of
// UTF-16 surrogate pair L"\xD843\xDEF9":
wchar_t in_utf16[] = L"\xD843";

std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> cvt;
auto out_utf8 = cvt.to_bytes(in_utf16); // No error.

There is no indication that input was incomplete (the value returned by cvt.state() is not documented and so cannot be examined by user for that purpose). As such the user will not know that more input data should be provided in additional call to cvt.to_bytes().

The output can be incomplete too: MSVC2017 implementation (which as far as I can tell is standard conforming) produces "\xF0" in out_utf8. Again, no indication of incomplete output produced is provided by std::wstring_convert.

IMO it makes std::wstring_convert in its current state completely useless (it cannot be relied upon to either produce complete and valid UTF sequence or throw an error in all situations).

Imagine a file has UTF16 encoded text. You want to read all the data from a file at once and convert it into UTF8 using std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>>.

Now, if a file contains completely invalid UTF16 (for example, forbidden or incorrectly encoded Unicode code points) you will get an exception from std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>>.

But if a file contains incomplete (but in all other regards valid) UTF16 (for ex. file ends with only the first half of a valid surrogate pair) you will neither get an error exception from std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> nor any indication that the input provided to std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> was incomplete.

History
Date User Action Args
2017-06-05 15:41:21adminsetmessages: + msg9230
2017-06-05 15:41:21adminsetmessages: + msg9229
2017-06-05 15:41:21adminsetstatus: lewg -> nad
2017-01-30 15:36:02adminsetmessages: + msg8830
2017-01-30 15:36:02adminsetstatus: new -> lewg
2017-01-08 00:00:00admincreate