Title
Restrictions on the ordinary literal encoding
Status
open
Section
5.3.1 [lex.charset]
Submitter
Jim X

Created on 2023-03-28.00:00:00 last changed 21 months ago

Messages

Date: 2023-08-01.18:23:09

There are no restrictions on the implementation's choice of ordinary literal encoding. However, there is an implicit assumption that a code unit value must fit into a char.

Tangentially related to that, "cannot be encoded as a single code unit" could be interpreted as referring to the values of the code units as opposed to the fact that multiple code units might be needed.

Possible resolution:

  1. Change in 5.3.1 [lex.charset] paragraph 8 as follows and add to the index of implementation-defined behavior:

    A code unit is an integer value of character type (6.8.2 [basic.fundamental]). Characters in a character-literal other than a multicharacter or non-encodable character literal or in a string-literal are encoded as a sequence of one or more code units, as determined by the encoding-prefix (5.13.3 [lex.ccon], 5.13.5 [lex.string]); this is termed the respective literal encoding. The ordinary literal encoding is the implementation-defined encoding applied to an ordinary character or string literal; its code units are of type unsigned char. The wide literal encoding is the implementation-defined encoding applied to a wide character or string literal; its code units are of type wchar_t.
  2. Change in 5.13.3 [lex.ccon] bullet 3.1 as follows:

    • A character-literal with a c-char-sequence consisting of a single basic-c-char , simple-escape-sequence, or universal-character-name is the code unit value of the specified character as encoded in the literal's associated character encoding. If the specified character lacks representation in the literal's associated character encoding or if it cannot be encoded as a single code unit is encoded with multiple code units, then the program is ill-formed.
    • ...
History
Date User Action Args
2023-03-28 00:00:00admincreate