Title
filesystem::u8path should be undeprecated
Status
open
Section
[depr.fs.path.factory]
Submitter
Daniel Krügler

Created on 2022-12-10.00:00:00 last changed 19 months ago

Messages

Date: 2024-01-29.16:34:32

Proposed resolution:

This wording is relative to N4917.

  1. Restore the u8path declarations to [fs.filesystem.syn], header <filesystem> synopsis, as indicated:

    namespace std::filesystem {
      // [fs.class.path], paths
      class path;
    
      // [fs.path.nonmember], path non-member functions
      void swap(path& lhs, path& rhs) noexcept;
      size_t hash_value(const path& p) noexcept;
      
      // [fs.path.factory], path factory functions
      template<class Source>
        path u8path(const Source& source);
      template<class InputIterator>
        path u8path(InputIterator first, InputIterator last);
    
      // [fs.class.filesystem.error], filesystem errors
      class filesystem_error;
    […]
    }
    
  2. Restore the previous sub-clause [fs.path.factory] by copying the contents of [depr.fs.path.factory] to a new sub-clause [fs.path.factory] between [fs.path.nonmember] and [fs.path.hash] and without Note 1 as indicated:

    [Drafting note: As additional stylistic adaption we replace the obsolete Requires element by a Preconditions element plus a Mandates element (similar to that of [fs.path.construct] p5).

    As a second stylistic improvement we convert the now more unusual "if […]; otherwise" construction in bullets by "Otherwise, if […]" constructions.]

    ? Factory functions [fs.path.factory]

    template<class Source>
      path u8path(const Source& source);
    template<class InputIterator>
      path u8path(InputIterator first, InputIterator last);
    

    -?- Mandates: The value type of Source and InputIterator is char or char8_t.

    -?- Preconditions: The source and [first, last) sequences are UTF-8 encoded.

    -?- Returns:

    1. (?.1) — If value_type is char and the current native narrow encoding ([fs.path.type.cvt]) is UTF-8, return path(source) or path(first, last).

    2. (?.2) — Otherwise, if value_type is wchar_t and the native wide encoding is UTF-16, or if value_type is char16_t or char32_t, convert source or [first, last) to a temporary, tmp, of type string_type and return path(tmp).

    3. (?.3) — Otherwise, convert source or [first, last) to a temporary, tmp, of type u32string and return path(tmp).

    -?- Remarks: Argument format conversion ([fs.path.fmt.cvt]) applies to the arguments for these functions. How Unicode encoding conversions are performed is unspecified.

    -?- [Example 1: A string is to be read from a database that is encoded in UTF-8, and used to create a directory using the native encoding for filenames:

    namespace fs = std::filesystem;
    std::string utf8_string = read_utf8_data();
    fs::create_directory(fs::u8path(utf8_string));
    

    For POSIX-based operating systems with the native narrow encoding set to UTF-8, no encoding or type conversion occurs.

    For POSIX-based operating systems with the native narrow encoding not set to UTF-8, a conversion to UTF-32 occurs, followed by a conversion to the current native narrow encoding. Some Unicode characters may have no native character set representation.

    For Windows-based operating systems a conversion from UTF-8 to UTF-16 occurs. — end example]

  3. Delete sub-clause [depr.fs.path.factory] in its entirety.

Date: 2023-05-15.00:00:00

[ 2023-05-30; status to "Open" ]

LEWG discussed this in January and had no consensus for undeprecation.

Date: 2023-01-15.00:00:00

[ 2023-01-06; Reflector poll ]

Set priority to 3 after reflector poll. Set status to LEWG.

Date: 2022-12-10.00:00:00

The filesystem::u8path function became deprecated with the adoption of P0482R6, but the rationale for that change is rather thin:

"The C++ standard must improve support for UTF-8 by removing the existing barriers that result in redundant tagging of character encodings, non-generic UTF-8 specific workarounds like u8path."

The u8path function is still useful if my original string source is a char sequence and I do know that the encoding of this sequence is UTF-8.

The deprecation note suggests that one should use std::u8string instead, which costs me an additional transformation and doesn't work without reinterpret_cast.

Even in the presence of char8_t, legacy code bases often are still ABI-bound to char. In the future we may solve this problem using the tools provided by P2626 instead, but right now this is not part of the standard and it wasn't at the time when u8path became deprecated. This is in my opinion a good reason to undeprecate u8path now and decide later on the appropriate time to deprecate it again (if it really turns out to be obsolete by alternative functionality).

Billy O'Neal provides a concrete example where the current deprecation status causes pain:

Example: vcpkg-tool files.cpp#L21-L45

Before p0482, we could just call std::u8path and it would do the right thing on both POSIX and Windows. After compilers started implementing '20, we have to make assumptions about the correct 'internal' std::path encoding because there is no longer a way to arrive to std::path with a char buffer that we know is UTF-8 encoded and get the correct results.

It's one of the reasons we completely ripped out use of std::filesystem on most platforms from vcpkg, so you won't see this in current sources.

History
Date User Action Args
2023-05-30 17:11:47adminsetmessages: + msg13594
2023-05-30 17:11:47adminsetstatus: lewg -> open
2023-01-06 14:40:19adminsetmessages: + msg13174
2023-01-06 14:40:19adminsetstatus: new -> lewg
2022-12-12 17:23:17adminsetmessages: + msg13154
2022-12-10 00:00:00admincreate