Title
Rework phases for string literal concatenation and token formation
Status
ready
Section
5.2 [lex.phases]
Submitter
US

Created on 2025-10-01.00:00:00 last changed 1 month ago

Messages

Date: 2025-11-04.22:23:01

Proposed resolution (approved by CWG 2025-11-04):

  1. Change in 5.2 [lex.phases] paragraph 5 through 7 as follows:

    5. For a sequence of two or more adjacent string-literal preprocessing tokens, a common encoding-prefix is determined as specified in 5.13.5 [lex.string]. Each such string-literal preprocessing token is then considered to have that common encoding-prefix. 6. Adjacent Then, adjacent string-literal preprocessing tokens are concatenated (5.13.5 [lex.string]).

    7. 6. Each preprocessing token is converted into a token (5.10 [lex.token]).

    7. The resulting tokens constitute a translation unit and are syntactically and semantically analyzed as a translation-unit (6.7 [basic.link]) and translated. ...

  2. Change in 5.5 [lex.pptoken] paragraph 1 as follows:

    A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6 5.
  3. Change in 5.8 [lex.operators] paragraph 1 as follows:

    ... Each operator-or-punctuator is converted to a single token in translation phase 7 6 (5.2 [lex.phases]).
  4. Change in 5.13.5 [lex.string] paragraph 8 as follows:

    In translation phase 6 5 (5.2 [lex.phases]), adjacent string-literals are concatenated. The lexical structure and grouping of the contents of the individual string-literals is retained.
  5. Change in 5.13.9 [lex.ext] paragraph 8 as follows:

    In translation phase 6 5 (5.2 [lex.phases]), adjacent string-literals are concatenated and user-defined-string-literals are considered string-literals for that purpose. During concatenation, ud-suffix es are removed and ignored and the concatenation process occurs as described in 5.13.5 [lex.string]. At the end of phase 6 5, if a string-literal is the result of a concatenation involving at least one user-defined-string-literal, all the participating user-defined-string-literals shall have the same ud-suffix and that suffix is applied to the result of the concatenation.
  6. Change in 21.4.16 [meta.reflection.define.aggregate] bullet 5.2 as follows (addresses alternative tokens (e.g. xor) and exceptions instead of evaluation failure):

    Throws: meta::exception unless the following conditions are met:
    • ...
    • if options.name contains a value, then:
      • holds_alternative<u8string>(options.name->contents ) is true and get<u8string>( options.name->contents ) contains the spelling of a valid token that is an identifier identifier (5.11 [lex.name]) that is not a keyword (5.12 [lex.key]) when interpreted with UTF-8, or
      • holds_alternative<string>(options.name->contents ) is true and get<string>(options.name->contents ) contains the spelling of a valid token that is an identifier identifier (5.11 [lex.name]) that is not a keyword (5.12 [lex.key]) when interpreted with the ordinary literal encoding;
      [Note 3: The name corresponds to the spelling of an identifier token after phase 6 of translation (5.2 [lex.phases]). Lexical constructs like universal-character-names (5.3.2 [lex.universal.char]) are not processed and will cause evaluation to fail. For example, R"(\u03B1)" is an invalid identifier and is not interpreted as "a". —end note]
    • ...
Date: 2025-11-04.22:23:01
N5028 comment US 6-020
N5028 comment US 7-019

Merge phases 5 and 6, because both deal with the same contiguous sequences of string literals. Then, move the conversion of pp-tokens to tokens into a new phase 6.

History
Date User Action Args
2025-11-06 23:04:52adminsetstatus: tentatively ready -> ready
2025-11-04 22:23:01adminsetstatus: review -> tentatively ready
2025-10-25 08:09:19adminsetstatus: open -> review
2025-10-12 08:55:04adminsetmessages: + msg8159
2025-10-01 00:00:00admincreate