Excluded characters in universal character names
5.3 [lex.charset]
Daveed Vandevoorde

Created on 2006-02-08.00:00:00 last changed 161 months ago


Date: 2007-10-15.00:00:00

[Moved to DR at October 2007 meeting.]

Date: 2007-10-15.00:00:00

Proposed resolution (October, 2007):

This issue is resolved by the adoption of paper J16/07-0030 = WG21 N2170.

Date: 2006-02-08.00:00:00

C99 and C++ differ in their approach to universal character names (UCNs).

Issue 248 already covers the differences in UCNs allowed for identifiers, but a more fundamental issue is that of UCNs that correspond to codes reserved by ISO 10676 for surrogate pair forms.

Specifically, C99 does not allow UCNs whose short names are in the range 0xD800 to 0xDFFF. I think C++ should have the same constraint. If someone really wants to place such a code in a character or string literal, they should use a hexadecimal escape sequence instead, for example:

    wchar_t  w1 = L'\xD900'; // Okay.
    wchar_t  w2 = L'\uD900'; // Error, not a valid character.

(Compare 6.4.3 paragraph 2 in ISO/IEC 9899/1999 with 5.3 [lex.charset] paragraph 2 in the C++ standard.)

Date User Action Args
2008-10-05 00:00:00adminsetstatus: wp -> cd1
2008-03-17 00:00:00adminsetstatus: dr -> wp
2007-10-09 00:00:00adminsetmessages: + msg1549
2007-10-09 00:00:00adminsetmessages: + msg1548
2007-10-09 00:00:00adminsetstatus: drafting -> dr
2006-04-22 00:00:00adminsetstatus: open -> drafting
2006-02-08 00:00:00admincreate