Title
Representation of source characters as universal-character-names
Status
cd4
Section
5.2 [lex.phases]
Submitter
Richard Smith

Created on 2014-09-09.00:00:00 last changed 94 months ago

Messages

Date: 2015-05-15.00:00:00

[Moved to DR at the May, 2015 meeting.]

Date: 2015-04-15.00:00:00

Proposed resolution (April, 2015):

Change 5.2 [lex.phases] paragraph 1 number 1 as follows:

...(An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e. e.g., using the \uXXXX notation), are handled equivalently except where this replacement is reverted in a raw string literal.)
Date: 2014-09-09.00:00:00

According to 5.2 [lex.phases] paragraph 1, first phase,

Any source file character not in the basic source character set (5.3 [lex.charset]) is replaced by the universal-character-name that designates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e., using the \uXXXX notation), are handled equivalently except where this replacement is reverted in a raw string literal.)

This wording is obviously not intended to exclude the use of characters with code points larger than 0xffff, but the reference to “the \uXXXX notation” might suggest that the \Uxxxxxxxx form is not allowed.

History
Date User Action Args
2017-02-06 00:00:00adminsetstatus: drwp -> cd4
2015-11-10 00:00:00adminsetstatus: dr -> drwp
2015-05-25 00:00:00adminsetmessages: + msg6036
2015-05-25 00:00:00adminsetstatus: tentatively ready -> dr
2015-04-13 00:00:00adminsetmessages: + msg5308
2015-04-13 00:00:00adminsetstatus: drafting -> tentatively ready
2014-09-09 00:00:00admincreate