Created on 2011-06-20.00:00:00 last changed 47 months ago
Additional note, February, 2021:
This issue was resolved editorially in N4842.
According to 5.3.1 [lex.charset] paragraph 2,
The character designated by the universal-character-name \UNNNNNNNN is that character whose character short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the universal-character-name \uNNNN is that character whose character short name in ISO/IEC 10646 is 0000NNNN. If the hexadecimal value for a universal-character-name corresponds to a surrogate code point (in the range 0xD800-0xDFFF, inclusive), the program is ill-formed. Additionally, if the hexadecimal value for a universal-character-name outside the c-char-sequence, s-char-sequence, or r-char-sequence of a character or string literal corresponds to a control character (in either of the ranges 0x00-0x1F or 0x7F-0x9F, both inclusive) or to a character in the basic source character set, the program is ill-formed.
It is not specified what should happen if the hexadecimal value does not designate a Unicode code point: is that undefined behavior or does it make the program ill-formed?
As an aside, a note should be added explaining why these requirements apply to to an r-char-sequence when, as the footnote at the end of the paragraph explains,
A sequence of characters resembling a universal-character-name in an r-char-sequence (5.13.5 [lex.string]) does not form a universal-character-name.
History | |||
---|---|---|---|
Date | User | Action | Args |
2021-02-24 00:00:00 | admin | set | messages: + msg6524 |
2021-02-24 00:00:00 | admin | set | status: drafting -> cd5 |
2011-06-20 00:00:00 | admin | create |