Title
Restrict the valid types of arg-id for width and precision in std-format-spec
Status
c++23
Section
[format.string.std]
Submitter
Mark de Wever

Created on 2022-06-19.00:00:00 last changed 13 months ago

Messages

Date: 2023-02-13.10:17:57

Proposed resolution:

This wording is relative to N4917.

  1. Modify [format.string.std] as indicated:

    -8- If { arg-idopt } is used in a width or precision, the value of the corresponding formatting argument is used in its place. If the corresponding formatting argument is not of integralstandard signed or unsigned integer type, or its value is negative, an exception of type format_error is thrown.

  2. Add a new paragraph to [diff.cpp20.utilities] as indicated:

    Affected subclause: [format.string.std]

    Change: Restrict types of formatting arguments used as width or precision in a std-format-spec.

    Rationale: Disallow types that do not have useful or portable semantics as a formatting width or precision.

    Effect on original feature: Valid C++ 2020 code that passes a boolean or character type as arg-id becomes invalid. For example:

    std::format("{:*^{}}", "", true); // ill-formed, previously returned "*"
    std::format("{:*^{}}", "", '1'); // ill-formed, previously returned an implementation-defined number of '*' characters
    
Date: 2023-02-13.00:00:00

[ 2023-02-13 Approved at February 2023 meeting in Issaquah. Status changed: Voting → WP. ]

Date: 2022-11-10.20:21:57

[ Kona 2022-11-10; Move to Ready ]

Date: 2022-11-15.00:00:00

[ 2022-11-10; Jonathan revises wording ]

Improve Annex C entry.

Date: 2022-11-15.00:00:00

[ 2022-11-01; Jonathan provides improved wording ]

Previous resolution [SUPERSEDED]:

This wording is relative to N4917.

  1. Modify [format.string.std] as indicated:

    -8- If { arg-idopt } is used in a width or precision, the value of the corresponding formatting argument is used in its place. If the corresponding formatting argument is not of integralstandard signed or unsigned integer type, or its value is negative, an exception of type format_error is thrown.

  2. Add a new paragraph to [diff.cpp20.utilities] as indicated:

    Affected subclause: [format.string.std]

    Change: Restrict types of formatting arguments used as width or precision in a std-format-spec.

    Rationale: Avoid types that are not useful or do not have portable semantics.

    Effect on original feature: Valid C++ 2020 code that passes a boolean or character type as arg-id becomes invalid. For example:

    std::format("{:*^{}}", "", true); // ill-formed, previously returned "*"
    std::format("{:*^{}}", "", '1'); // ill-formed, previously returned an implementation-defined number of '*' characters
    
Date: 2022-07-15.00:00:00

[ 2022-07-08; Reflector poll ]

Set priority to 2 after reflector poll. Tim Song commented:

"This is technically a breaking change, so we should do it sooner rather than later.

"I don't agree with the second part of the argument though - I don't see how this wording requires adding those transcoding specializations. Nothing in this wording requires integral types that cannot be packed into basic_format_arg to be accepted.

"I also think we need to restrict this to signed or unsigned integer types with size no greater than sizeof(long long). Larger types get type-erased into a handle and the value isn't really recoverable without heroics."

Previous resolution [SUPERSEDED]:

This wording is relative to N4910.

  1. Modify [format.string.std] as indicated:

    -7- If { arg-idopt } is used in a width or precision, the value of the corresponding formatting argument is used in its place. If the corresponding formatting argument is not of integralsigned or unsigned integer type, or its value is negative for precision or non-positive for width, an exception of type format_error is thrown.

  2. Add a new paragraph to [diff.cpp20.utilities] as indicated:

    Affected subclause: [format]

    Change: Requirement changes of arg-id of the width and precision fields of std-format-spec. arg-id now requires a signed or unsigned integer type instead of an integral type.

    Rationale: Avoid types that are not useful and the need to specify enabled formatter specializations for all character types.

    Effect on original feature: Valid C++ 2020 code that passes a boolean or character type as arg-id becomes invalid. For example:

    std::format("{:*^{}}", "", true); // ill-formed, previously returned "*"
    
Date: 2022-06-19.00:00:00

Per [format.string.std]/7

If { arg-idopt } is used in a width or precision, the value of the corresponding formatting argument is used in its place. If the corresponding formatting argument is not of integral type, or its value is negative for precision or non-positive for width, an exception of type format_error is thrown.

The issue is the integral type requirement. The following code is currently valid:

std::cout << std::format("{:*^{}}\n", 'a', '0');
std::cout << std::format("{:*^{}}\n", 'a', true);

The output of the first example depends on the value of '0' in the implementation. When a char has signed char as underlying type negative values are invalid, while the same value would be valid when the underlying type is unsigned char. For the second example the range of a boolean is very small, so this seems not really useful.

Currently libc++ rejects these two examples and MSVC STL accepts them. The members of the MSVC STL team, I spoke, agree these two cases should be rejected.

The following integral types are rejected by both libc++ and MSVC STL:

std::cout << std::format("{:*^{}}\n", 'a', L'0');
std::cout << std::format("{:*^{}}\n", 'a', u'0');
std::cout << std::format("{:*^{}}\n", 'a', U'0');
std::cout << std::format("{:*^{}}\n", 'a', u8'0');

In order to accept these character types they need to meet the basic formatter requirements per [format.functions]/20 and [format.functions]/25

formatter<remove_cvref_t<Ti>, charT> meets the BasicFormatter requirements ([formatter.requirements]) for each Ti in Args.

which requires adding the following enabled formatter specializations to [format.formatter.spec].

template<> struct formatter<wchar_t, char>;

template<> struct formatter<char8_t, charT>;
template<> struct formatter<char16_t, charT>;
template<> struct formatter<char32_t, charT>;

Note, that the specialization template<> struct formatter<char, wchar_t> is already required by the Standard.

Not only do they need to be added, but it also needs to be specified how they behave when their value is not in the range of representable values for charT.

Instead of requiring these specializations, I propose to go the other direction and limit the allowed types to signed and unsigned integers.

History
Date User Action Args
2023-11-22 15:47:43adminsetstatus: wp -> c++23
2023-02-13 10:17:57adminsetmessages: + msg13352
2023-02-13 10:17:57adminsetstatus: voting -> wp
2023-02-06 15:33:48adminsetstatus: ready -> voting
2022-11-10 20:21:57adminsetmessages: + msg12994
2022-11-10 20:21:57adminsetmessages: + msg12993
2022-11-10 20:21:57adminsetstatus: new -> ready
2022-11-01 14:28:29adminsetmessages: + msg12900
2022-07-08 20:04:38adminsetmessages: + msg12560
2022-06-25 18:31:19adminsetmessages: + msg12529
2022-06-19 00:00:00admincreate