Issue 3965: Incorrect example in [format.string.escaped] p3 for formatting of combining characters

Title: Incorrect example in [format.string.escaped] p3 for formatting of combining characters
Status: c++26
Section: [format.string.escaped]
Submitter: Tom Honermann

Created on 2023-07-31.00:00:00 last changed 3 weeks ago

Messages

msg13849 (view)

Date: 2023-11-13.14:08:10

Proposed resolution:

This wording is relative to N4950 plus missing editorial pieces from P2286R8.

Modify the example following [format.string.escaped] p3 as indicated:

[Drafting note: The presented example was voted in as part of P2286R8 during the July 2022 Virtual Meeting but is not yet accessible in the most recent working draft N4950.
Note that the final character (♂️) is composed from the two code points U+2642 and U+FE0F. ]
```
string s6 = format("[{:?}]", "🤷‍♂️"); // s6 has value: ["🤷\u{200d}♂\u{fe0f}"]["🤷\u{200d}♂️"]
```

msg13768 (view)

Date: 2023-11-11.00:00:00

[ 2023-11-11 Approved at November 2023 meeting in Kona. Status changed: Voting → WP. ]

msg13699 (view)

Date: 2023-10-15.00:00:00

[ 2023-10-27; Reflector poll ]

Set status to Tentatively Ready after six votes in favour during reflector poll.

msg13698 (view)

Date: 2023-07-31.00:00:00

The C++23 DIS contains the following example in [format.string.escaped] p3. (This example does not appear in the most recent N4950 WP or on https://eel.is/c++draft because the project editor has not yet merged changes needed to support rendering of some of the characters involved).

string s6 = format("[{:?}]", "🤷‍♂️"); // s6 has value: ["🤷\u{200d}♂\u{fe0f}"]

The character to be formatted (🤷‍♂️) consists of the following sequence of code points in the order presented:

U+1F937 (SHRUG)
U+200D (ZERO WIDTH JOINER)
U+2642 (MALE SIGN)
U+FE0F (VARIATION SELECTOR-16)

[format.string.escaped] bullet 2.2.1 specifies which code points are to be formatted as a \u{hex-digit-sequence} escape sequence:

(2.2.1) — If X encodes a single character C, then:
1. (2.2.1.1) — If C is one of the characters in Table 75 [tab:format.escape.sequences], then the two characters shown as the corresponding escape sequence are appended to E.
2. (2.2.1.2) — Otherwise, if C is not U+0020 SPACE and
  1. (2.2.1.2.1) — CE is UTF-8, UTF-16, or UTF-32 and C corresponds to a Unicode scalar value whose Unicode property General_Category has a value in the groups Separator (Z) or Other (C), as described by UAX #44 of the Unicode Standard, or
  2. (2.2.1.2.2) — CE is UTF-8, UTF-16, or UTF-32 and C corresponds to a Unicode scalar value with the Unicode property Grapheme_Extend=Yes as described by UAX #44 of the Unicode Standard and C is not immediately preceded in S by a character P appended to E without translation to an escape sequence, or
  3. (2.2.1.2.3) — CE is neither UTF-8, UTF-16, nor UTF-32 and C is one of an implementation-defined set of separator or non-printable characters
  then the sequence \u{hex-digit-sequence} is appended to E, where hex-digit-sequence is the shortest hexadecimal representation of C using lower-case hexadecimal digits.
3. (2.2.1.3) — Otherwise, C is appended to E.

The example is not consistent with the above specification for the final code point. U+FE0F is a single character, is not one of the characters in Table 75, is not U+0020, has a General_Category of Nonspacing Mark (Mn) which is neither Z nor C, has Grapheme_Extend=Yes but the prior character (U+2642) is not formatted as an escape sequence, and is not one of an implementation-defined set of separator or non-printable characters (for the purposes of this example; the example assumes a UTF-8 encoding). Thus, formatting for this character falls to the last bullet point and the character should be appended as is (without translation to an escape sequence). Since this character is a combining character, it should combine with the previous character and thus alter the appearance of U+2642 (thus producing "♂️" instead of "♂\u{fe0f}").

History
Date	User	Action	Args
2026-06-09 11:42:05	admin	set	status: wp -> c++26
2023-11-13 14:08:10	admin	set	messages: + msg13849
2023-11-13 14:08:10	admin	set	status: voting -> wp
2023-11-07 21:41:54	admin	set	status: ready -> voting
2023-10-27 21:22:44	admin	set	messages: + msg13768
2023-10-27 21:22:44	admin	set	status: new -> ready
2023-08-04 18:31:28	admin	set	messages: + msg13699
2023-07-31 00:00:00	admin	create