Title
Standard exception messages have unspecified encoding
Status
open
Section
[exception]
Submitter
Victor Zverovich

Created on 2024-04-28.00:00:00 last changed 1 month ago

Messages

Date: 2024-05-08.09:50:29

Proposed resolution:

This wording is relative to N4981.

  1. Modify [exception] as indicated:

    virtual const char* what() const noexcept;
    

    Returns: An implementation-defined ntbs in the ordinary literal encoding.

    Remarks: The message may be a null-terminated multibyte string ([multibyte.strings]), suitable for conversion and display as a wstring ([string.classes], [locale.codecvt]). The return value remains valid until the exception object from which it is obtained is destroyed or a non-const member function of the exception object is called.

Date: 2024-05-15.00:00:00

[ 2024-05-08; Reflector poll ]

Set priority to 3 after reflector poll. Send to SG16.

Date: 2024-05-15.00:00:00

[ 2024-05-04; Daniel comments ]

The proposed wording is incomplete. There are about 12 other what specifications in the Standard Library with exactly the same specification as exception::what that would either need to get the same treatment or we would need general wording somewhere that says that the specification "contract" of exception::what extends to all of its derived classes. A third choice could be that we introduce a new definition such as an lntbs (or maybe "literal ntbs") that is essentially an ntbs in the ordinary literal encoding.

Date: 2024-04-28.00:00:00

The null-terminated multibyte string returned by the what method of std::exception and its subclasses in the standard has an unspecified encoding. The closest thing in the specification is the "suitable for conversion and display as a wstring" part in Remarks ([exception] p6) but it is too vague to be useful because anything can be converted to wstring in one way or another:

virtual const char* what() const noexcept;

Returns: An implementation-defined ntbs.

Remarks: The message may be a null-terminated multibyte string ([multibyte.strings]), suitable for conversion and display as a wstring ([string.classes], [locale.codecvt]). The return value remains valid until the exception object from which it is obtained is destroyed or a non-const member function of the exception object is called.

As a result, it is impossible to portably use the exception message, e.g. print it. Since exception messages are commonly combined with string literals and are often constructed from string literals, at the very least the standard should say that the message is compatible with them, i.e. that it is in the ordinary literal encoding or its subset.

To give a specific example of this problem, consider the following code compiled on Windows with Microsoft Visual C++, the ordinary literal encoding of UTF-8 and the system locale set to Belarusian (the language of the text in this example):

std::uintmax_t size = 0;
try {
  size = std::filesystem::file_size(L"Шчучыншчына");
} catch (const std::exception& e) {
  std::print("Памылка: {}", e.what());
}

Since both std::filesystem::path and std::print support Unicode one would expect this to work and, when run, print a readable error message if the file "Шчучыншчына" doesn't exist. However, the output will be corrupted instead. The reason for the corruption is that filesystem_error requires including the path in the message but doesn't say that it should be transcoded ([fs.filesystem.error.members] p7):

virtual const char* what() const noexcept;

Returns: An ntbs that incorporates the what_arg argument supplied to the constructor. The exact format is unspecified. Implementations should include the system_error::what() string and the pathnames of path1 and path2 in the native format in the returned string.

Therefore, the message will contain literal text in the ordinary literal encoding (UTF-8) combined with a path, most likely in the operating system dependent current encoding for pathnames which in this case is CP1251. So different parts of the output will be in two incompatible encodings and therefore unusable with std::print or any other facility.

The actual observable behavior for the above example is no output in the Windows console which is extremely broken but appears to be conformant with the current specification. It was reproduced with {fmt}'s implementation of print since Microsoft STL doesn't implement std::print yet. Replacing std::print with another output facility produces a different but equally unusable form of mojibake.

History
Date User Action Args
2024-05-08 09:50:29adminsetmessages: + msg14115
2024-05-08 09:50:29adminsetstatus: new -> open
2024-05-04 15:21:33adminsetmessages: + msg14098
2024-05-04 15:21:33adminsetmessages: + msg14097
2024-04-28 00:00:00admincreate