Created on 2026-04-27.00:00:00 last changed 2 weeks ago
(From submission #893.)
According to 5.2 [lex.phases] bullet 1.3, universal-character-names (outside of string-literals) are replaced in phase 3 with their corresponding (single) character:
... As characters from the source file are consumed to form the next preprocessing token (i.e., not being consumed as part of a comment or other forms of whitespace), except when matching a c-char-sequence, s-char-sequence, r-char-sequence, h-char-sequence, or q-char-sequence, universal-character-names are recognized (5.3.2 [lex.universal.char]) and replaced by the designated element of the translation character set (5.3.1 [lex.charset]). ...
This rule (and the surrounding change in treatment of UCNs) was introduced by paper P2314R4 (adopted in October, 2021).
#define X π
#define X \u03C0 // clang and MSVC (old preprocessor) accept; gcc, EDG, and MSVC (new preprocessor) warn about incompatible macro redefinition
Also consider:
#include <stdio.h>
#define S1(...) # __VA_ARGS__
#define S2(...) # __VA_OPT__(__VA_ARGS__)
int main(){
#define X \u03C0
printf("%s %s\n", S1(X), S1(S1(X))); // output on all implementations: X S1(X)
printf("%s %s\n", S2(X), S2(S2(X))); // output on all implementations: π "\u03C0"
#define Y π
printf("%s %s\n", S2(Y), S2(S2(Y))); // output on all implementations: π "π"
}
Note that 15.7.3 [cpp.stringize] paragraph 2 talks about "original spelling", which might be interpreted as retaining UCNs:
... Otherwise, the original spelling of each preprocessing token in the stringizing argument is retained in the character string literal, except for special handling for producing the spelling of header-names, character-literals, and string-literals ...
Furthermore, there is the question whether universal-character-names can be formed using ## concatenation (godbolt):
#define CAT(X,Y) X ## Y
#define Y CAT(\,u03C0)
int Y; // clang, gcc, EDG accept; MSVC (new preprocessor) rejects, because no valid preprocessing token is formed
Paper P2621R3 (adopted in June, 2023) added the following note to 15.7.4 [cpp.concat] paragraph 3:
[Note 1: Concatenation can form a universal-character-name (5.3.1 [lex.charset]). —end note]
It is unclear what the normative basis for that note is, given that concatenation does not branch back to phase 3 where UCN recognition would happen. The implementation survey in P2621R3 indicated widespread implementation support for forming UCNs via concatenation.
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2026-04-27 00:00:00 | admin | create | |