Issue 3290: Are std::format field widths code units, code points, or something else?

Title: Are std::format field widths code units, code points, or something else?
Status: c++20
Section: [format.string.std]
Submitter: Tom Honermann

Created on 2019-09-08.00:00:00 last changed 56 months ago

Messages

msg11202 (view)

Date: 2020-04-07.00:00:00

[ 2020-04-07 Voted into the WP in Prague. Status changed: New → WP. ]

msg10651 (view)

Date: 2020-02-15.00:00:00

[ 2020-02-13, Prague ]

Resolved by P1868R2

msg10650 (view)

Date: 2020-02-13.10:31:32

[format.string.std] p7 states:

The positive-integer in width is a decimal integer defining the minimum field width. If width is not specified, there is no minimum field width, and the field width is determined based on the content of the field.

Is field width measured in code units, code points, or something else?

Consider the following example assuming a UTF-8 locale:

std::format("{}", "\xC3\x81");     // U+00C1        { LATIN CAPITAL LETTER A WITH ACUTE }
std::format("{}", "\x41\xCC\x81"); // U+0041 U+0301 { LATIN CAPITAL LETTER A } { COMBINING ACUTE ACCENT }

In both cases, the arguments encode the same user-perceived character (Á). The first uses two UTF-8 code units to encode a single code point that represents a single glyph using a composed Unicode normalization form. The second uses three code units to encode two code points that represent the same glyph using a decomposed Unicode normalization form.

How is the field width determined? If measured in code units, the first has a width of 2 and the second of 3. If measured in code points, the first has a width of 1 and the second of 2. If measured in grapheme clusters, both have a width of 1. Is the determination locale dependent?

Previous resolution [SUPERSEDED]:

This wording is relative to N4830.

Modify [format.string.std] as indicated:

-7- The positive-integer in width is a decimal integer defining the minimum field width. If width is not specified, there is no minimum field width, and the field width is determined based on the content of the field. Field width is measured in code units. Each byte of a multibyte character contributes to the field width.

History
Date	User	Action	Args
2021-02-25 10:48:01	admin	set	status: wp -> c++20
2020-04-07 12:58:55	admin	set	messages: + msg11202
2020-04-07 12:58:55	admin	set	status: new -> wp
2019-09-18 17:26:16	admin	set	messages: + msg10651
2019-09-08 00:00:00	admin	create