Title
Clarifying fill character in std::format
Status
open
Section
[format.string.std]
Submitter
Mark de Wever

Created on 2021-08-01.00:00:00 last changed 3 weeks ago

Messages

Date: 2021-08-27.19:08:25

Proposed resolution:

This wording is relative to N4892.

  1. Modify [format.string.std] as indicated:

    -1- […] The syntax of format specifications is as follows:

    […]
    fill:
                 any charactercodepoint of the literal encoding other than { or }
    […]
    

    -2- [Note 2: The fill character can be any charactercodepoint other than { or }. The presence of a fill character is signaled by the character following it, which must be one of the alignment options. If the second character of std-format-spec is not a valid alignment option, then it is assumed that both the fill character and the alignment option are absent. — end note]

Date: 2021-08-15.00:00:00

[ 2021-08-26; SG16 reviewed and provides alternative wording ]

Date: 2021-08-15.00:00:00

[ 2021-08-20; Reflector poll ]

Set priority to 2 and status to "SG16" after reflector poll.

Previous resolution [SUPERSEDED]:

This wording is relative to N4892.

  1. Modify [format.string.std] as indicated:

    -1- […] The syntax of format specifications is as follows:

    […]
    fill:
                 any Unicode grapheme cluster or character other than { or }
    […]
    

    -2- [Note 2: The fill character can be any character other than { or }. For a string in a Unicode encoding, the fill character can be any Unicode grapheme cluster other than { or }. For a string in a non-Unicode encoding, the fill character can be any character other than { or }. The output width of the fill character is always assumed to be one column.

    [Note 2: The presence of a fill character is signaled by the character following it, which must be one of the alignment options. If the second character of std-format-spec is not a valid alignment option, then it is assumed that both the fill character and the alignment option are absent. — end note]

Date: 2021-08-15.00:00:00

[ 2021-08-09; Mark de Wever provides improved wording ]

Date: 2021-08-14.19:11:32

The paper P1868 "width: clarifying units of width and precision in std::format" added optional Unicode support to the format header. This paper didn't update the definition of the fill character, which is defined as

"The fill character can be any character other than { or }."

This wording means the fill is a character and not a Unicode grapheme cluster. Based on the current wording the range of available fill characters depends on the char_type of the format string. After P1868 the determination of the required padding size is Unicode aware, but it's not possible to use a Unicode grapheme clusters as padding. This looks odd from a user's perspective and already lead to implementation divergence between libc++ and MSVC STL:

  • The WIP libc++ implementation stores one char_type, strictly adhering to the wording of the Standard.

  • MSVC STL stores one code point, regardless of the char_type used. This is already better from a user's perspective; all 1 code point grapheme clusters are properly handled.

For the width calculation the width of a Unicode grapheme cluster is estimated to be 1 or 2. Since padding with a 2 column width can't properly pad an odd number of columns the grapheme cluster used should always have a column width of 1.

The responsibility for precondition can be either be validated in the library or by the user. It would be possible to do the validation compile time and make the code ill-formed when the precondition is violated. For the following reason I think it's better to not validate the width:

  • P1868 14. Implementation

    "More importantly, our approach permits refining the definition in the future if there is interest in doing so. It will mostly require researching the status of Unicode support on terminals and minimal or no changes to the implementation."

    When an estimated width of 1 is required it means that improving the Standard may make previously valid code ill-formed after the improvement.

  • P1868 13. Examples

    The example of the family grapheme cluster is only rendered properly on the MacOS terminal. So even when the library does a proper validation it's not certain the output will be rendered properly.

Changing the fill type changes the size of the std::formatter and thus will be an ABI break.

The proposed resolution probably needs some additional changes since the Unicode and output width are specified later in the standard, specifically [format.string.std]/9 - 12.

Previous resolution [SUPERSEDED]:

This wording is relative to N4892.

  1. Modify [format.string.std] as indicated:

    -2- [Note 2: The fill character can be any character other than { or }. For a string in a Unicode encoding, the fill character can be any Unicode grapheme cluster other than { or }. For a string in a non-Unicode encoding, the fill character can be any character other than { or }. The output width of the fill character is always assumed to be one column. The presence of a fill character is signaled by the character following it, which must be one of the alignment options. If the second character of std-format-spec is not a valid alignment option, then it is assumed that both the fill character and the alignment option are absent. — end note]

History
Date User Action Args
2021-08-27 19:08:25adminsetmessages: + msg12022
2021-08-20 17:06:18adminsetmessages: + msg11989
2021-08-20 17:06:18adminsetstatus: new -> open
2021-08-14 19:11:32adminsetmessages: + msg11984
2021-08-07 17:19:07adminsetmessages: + msg11981
2021-08-01 00:00:00admincreate