Title
Clarify that std::string is not good for UTF-8
Status
c++20
Section
[depr.fs.path.factory]
Submitter
The Netherlands

Created on 2019-11-07.00:00:00 last changed 38 months ago

Messages

Date: 2020-02-12.03:28:55

Proposed resolution:

This wording is relative to N4835.

  1. Modify [depr.fs.path.factory] as indicated:

    -4- [Example: A string is to be read from a database that is encoded in UTF-8, and used to create a directory using the native encoding for filenames:

    namespace fs = std::filesystem;
    std::string utf8_string = read_utf8_data();
    fs::create_directory(fs::u8path(utf8_string));
    
    For POSIX-based operating systems with the native narrow encoding set to UTF-8, no encoding or type conversion occurs.

    For POSIX-based operating systems with the native narrow encoding not set to UTF-8, a conversion to UTF-32 occurs, followed by a conversion to the current native narrow encoding. Some Unicode characters may have no native character set representation.

    For Windows-based operating systems a conversion from UTF-8 to UTF-16 occurs. — end example]

    [Note: The example above is representative of a historical use of filesystem::u8path. Passing a std::u8string to path's constructor is preferred for an indication of UTF-8 encoding more consistent with path's handling of other encodings. — end note]

Date: 2020-02-12.03:28:55

[ 2020-02 Moved to Immediate on Tuesday in Prague. ]

Date: 2019-11-08.10:07:06

Addresses NL 375

Example in deprecated section implies that std::string is the type to use for utf8 strings.

[Example: A string is to be read from a database that is encoded in UTF-8, and used to create a directory using the native encoding for filenames:

namespace fs = std::filesystem;
std::string utf8_string = read_utf8_data();
fs::create_directory(fs::u8path(utf8_string));

Proposed change:

Add clarification that std::string is the wrong type for utf8 strings

Jeff Garland:

SG16 in Belfast: Recommend to accept with a modification to update the example in [depr.fs.path.factory] p4 to state that std::u8string should be preferred for UTF-8 data.

Rationale: The example code is representative of historic use of std::filesystem::u8path and should not be changed to use std::u8string. The recommended change is to a non-normative example and may therefore be considered editorial.

Previous resolution [SUPERSEDED]:

This wording is relative to N4835.

  1. Modify [depr.fs.path.factory] as indicated:

    -4- [Example: A string is to be read from a database that is encoded in UTF-8, and used to create a directory using the native encoding for filenames:

    namespace fs = std::filesystem;
    std::string utf8_string = read_utf8_data();
    fs::create_directory(fs::u8path(utf8_string));
    
    For POSIX-based operating systems with the native narrow encoding set to UTF-8, no encoding or type conversion occurs.

    For POSIX-based operating systems with the native narrow encoding not set to UTF-8, a conversion to UTF-32 occurs, followed by a conversion to the current native narrow encoding. Some Unicode characters may have no native character set representation.

    For Windows-based operating systems a conversion from UTF-8 to UTF-16 occurs. — end example]

    [Note: The example above is representative of historic use of filesystem u8path. New code should use std::u8string in place of std::string. — end note]

LWG Belfast Friday Morning

Requested changes:

  • Historic => historical.
  • Add missing :: before u8path.
  • Remove ISO rules forbidden 'should' in a note.
  • Use language describing why new code should use the u8string constructor rather than preaching that new code should do something.
Billy O'Neal provides updated wording.

History
Date User Action Args
2021-02-25 10:48:01adminsetstatus: wp -> c++20
2020-02-24 16:02:59adminsetstatus: immediate -> wp
2020-02-12 03:28:55adminsetmessages: + msg11040
2020-02-12 03:28:55adminsetstatus: new -> immediate
2019-11-07 19:10:57adminsetmessages: + msg10797
2019-11-07 00:00:00admincreate