Issue 1103: Reversion of phase 1 and 2 transformations in raw string literals

Title: Reversion of phase 1 and 2 transformations in raw string literals
Status: c++11
Section: 5.2 [lex.phases]
Submitter: US

Created on 2010-08-02.00:00:00 last changed 146 months ago

Messages

msg3140 (view)

Date: 2010-11-15.00:00:00

[Voted into the WP at the November, 2010 meeting.]

msg2752 (view)

Date: 2010-08-15.00:00:00

Proposed resolution (August, 2010):

Change 5.2 [lex.phases] paragraph 1 phase 1 as follows:

...(An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e., using the \uXXXX notation), are handled equivalently except where this replacement is reverted in a raw string literal.).)

Change 5.2 [lex.phases] paragraph 1 phase 3 as follows:

...[Example: see the handling of < within a #include preprocessing directive. —end example] ~~Within the r-char-sequence of a raw string literal, any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted.~~

Change 5.2 [lex.phases] paragraph 1 phase 5 as follows:

Each source character set member ~~and universal-character-name~~ in a character literal or a string literal, as well as each escape sequence and universal-character-name in a character literal or a non-raw string literal, is converted to the corresponding member of the execution character set (5.13.3 [lex.ccon], 5.13.5 [lex.string]); if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.

Change 5.3.1 [lex.charset] paragraph 2 as follows:

...Additionally, if the hexadecimal value for a universal-character-name outside the c-char-sequence, s-char-sequence, or r-char-sequence of a character or string literal corresponds to a control character (in either of the ranges 0x000x1F or 0x7F0x9F, both inclusive) or to a character in the basic source character set, the program is ill-formed. [Footnote: A sequence of characters resembling a universal-character-name in an r-char-sequence (5.13.5 [lex.string]) does not form a universal-character-name. —end footnote]

Change 5.5 [lex.pptoken] paragraph 3 as follows:

If the input stream has been parsed into preprocessing tokens up to a given character:

if If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as R", the next preprocessing token shall be a raw string literal;. Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified. The raw string literal is defined as the shortest sequence of characters that matches the raw-string pattern

encoding-prefix_opt R raw-string

~~otherwise~~ Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail.

Delete footnote 24 in 5.13.5 [lex.string] paragraph 2:

~~Use of characters with trigraph equivalents in a d-char-sequence may produce unintended results.~~

Insert the following examples after 5.13.5 [lex.string] paragraph 4:

[Example: The raw string
  R"a(
  )\
  a"
  )a"
is equivalent to "\n)\\\na\"\n". The raw string
  R"(??)"
is equivalent to "\?\?". The raw string
  R"#(
  )??="
  )#"
is equivalent to "\n)\?\?=\"\n". —end example]

msg2751 (view)

Date: 2010-08-02.00:00:00

N3092 comment US 13
N3092 comment US 14

“Raw” strings are still only Pittsburgh-rare strings: the reversion in phase 3 only applies to an r-char-sequence. It should apply to the entire raw string literal.

History
Date	User	Action	Args
2014-03-03 00:00:00	admin	set	status: fdis -> c++11
2011-04-10 00:00:00	admin	set	status: dr -> fdis
2010-11-29 00:00:00	admin	set	messages: + msg3140
2010-11-29 00:00:00	admin	set	status: ready -> dr
2010-08-23 00:00:00	admin	set	messages: + msg2752
2010-08-02 00:00:00	admin	create