Title
Restrict the valid types of arg-id for width and precision in std-format-spec
Status
new
Section
[format.string.std]
Submitter
Mark de Wever

Created on 2022-06-19.00:00:00 last changed 1 month ago

Messages

Date: 2022-07-08.20:04:38

Proposed resolution:

This wording is relative to N4910.

  1. Modify [format.string.std] as indicated:

    -7- If { arg-idopt } is used in a width or precision, the value of the corresponding formatting argument is used in its place. If the corresponding formatting argument is not of integralsigned or unsigned integer type, or its value is negative for precision or non-positive for width, an exception of type format_error is thrown.

  2. Add a new paragraph to [diff.cpp20.utilities] as indicated:

    Affected subclause: [format]

    Change: Requirement changes of arg-id of the width and precision fields of std-format-spec. arg-id now requires a signed or unsigned integer type instead of an integral type.

    Rationale: Avoid types that are not useful and the need to specify enabled formatter specializations for all character types.

    Effect on original feature: Valid C++ 2020 code that passes a boolean or character type as arg-id becomes invalid. For example:

    std::format("{:*^{}}", "", true); // ill-formed, previously returned "*"
    
Date: 2022-07-15.00:00:00

[ 2022-07-08; Reflector poll ]

Set priority to 2 after reflector poll. Tim Song commented:

"This is technically a breaking change, so we should do it sooner rather than later.

"I don't agree with the second part of the argument though - I don't see how this wording requires adding those transcoding specializations. Nothing in this wording requires integral types that cannot be packed into basic_format_arg to be accepted.

"I also think we need to restrict this to signed or unsigned integer types with size no greater than sizeof(long long). Larger types get type-erased into a handle and the value isn't really recoverable without heroics."

Date: 2022-06-19.00:00:00

Per [format.string.std]/7

If { arg-idopt } is used in a width or precision, the value of the corresponding formatting argument is used in its place. If the corresponding formatting argument is not of integral type, or its value is negative for precision or non-positive for width, an exception of type format_error is thrown.

The issue is the integral type requirement. The following code is currently valid:

std::cout << std::format("{:*^{}}\n", 'a', '0');
std::cout << std::format("{:*^{}}\n", 'a', true);

The output of the first example depends on the value of '0' in the implementation. When a char has signed char as underlying type negative values are invalid, while the same value would be valid when the underlying type is unsigned char. For the second example the range of a boolean is very small, so this seems not really useful.

Currently libc++ rejects these two examples and MSVC STL accepts them. The members of the MSVC STL team, I spoke, agree these two cases should be rejected.

The following integral types are rejected by both libc++ and MSVC STL:

std::cout << std::format("{:*^{}}\n", 'a', L'0');
std::cout << std::format("{:*^{}}\n", 'a', u'0');
std::cout << std::format("{:*^{}}\n", 'a', U'0');
std::cout << std::format("{:*^{}}\n", 'a', u8'0');

In order to accept these character types they need to meet the basic formatter requirements per [format.functions]/20 and [format.functions]/25

formatter<remove_cvref_t<Ti>, charT> meets the BasicFormatter requirements ([formatter.requirements]) for each Ti in Args.

which requires adding the following enabled formatter specializations to [format.formatter.spec].

template<> struct formatter<wchar_t, char>;

template<> struct formatter<char8_t, charT>;
template<> struct formatter<char16_t, charT>;
template<> struct formatter<char32_t, charT>;

Note, that the specialization template<> struct formatter<char, wchar_t> is already required by the Standard.

Not only do they need to be added, but it also needs to be specified how they behave when their value is not in the range of representable values for charT.

Instead of requiring these specializations, I propose to go the other direction and limit the allowed types to signed and unsigned integers.

History
Date User Action Args
2022-07-08 20:04:38adminsetmessages: + msg12560
2022-06-25 18:31:19adminsetmessages: + msg12529
2022-06-19 00:00:00admincreate