Title
Handling of header-names for #include and #embed
Status
open
Section
15.3 [cpp.include]
Submitter
Jens Maurer

Created on 2025-03-23.00:00:00 last changed 1 week ago

Messages

Date: 2025-03-26.18:46:53

There is non-parallel treatment for the header-name in include vs. #embed directives.

First, subclause 5.5 [lex.pptoken] paragraph 4.3 is missing a special-case treatment for #embed.

Second, 15.3 [cpp.include] (and thus 15.4.1 [cpp.embed.gen]) should ackowledge that lexing has completed at that point, and thus talk about header-name preprocessing tokens, not about sequences of characters.

Third, 15.4.1 [cpp.embed.gen] paragraph 11 talks about "resource name preprocessing tokens", which do not exist (see 5.5 [lex.pptoken]). Also, it should be clarified this rule applies to the general pp-tokens form of #embed only.

Fourth, __has_embed has this aberration:

  #define stdio nosuch
  #if __has_embed(<stdio.h>)    // looks for nosuch.h
  #embed <stdio.h>              // looks for stdio.h
  #endif

For __has_include, this is avoided by using two grammar productions, where the preferred one uses header-name (15.2 [cpp.cond]).

Fifth, for __has_include, it is unclear whether only the first (non-macro-expanded) preprocessing token should be eligible for special header-name treatment. There is implementation divergence.

Sixth, for the following example:

  #embed "foo\" vendor_specific_arg("something else") ...

the rule in 5.5 [lex.pptoken] bullet 4.3 would form a string-literal (because it consists of a longer sequence of characters), not the header-name "foo\".

Seventh, it is unclear how a q-char-sequence is supposed to be turned into a h-char-sequence when falling back to header search in 15.3 [cpp.include] paragraph 3, given that a q-char-sequence might contain a > character, making it not match the production h-char-sequence.

Possible resolution:

  1. Change in 5.5 [lex.pptoken] bullet 4.3 and add bullets as follows:

    • ...
    • Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail, except that
      • a header-name (5.6 [lex.header]) is only formed
        • immediately after the include, embed, or import preprocessing token in a #include (15.3 [cpp.include]), #embed (15.4 [cpp.embed]), or import (15.6 [cpp.import]) directive, respectively, or
        • within a has-include-expression immediately after a preprocessing token sequence of __has_include or __has_embed immediately followed by ( in a #if, #elif, or #embed directive (15.2 [cpp.cond], 15.4 [cpp.embed]) and
      • a string-literal token is never formed when a header-name token can be formed.
  2. Change in 15.2 [cpp.cond] before paragraph 1:

      has-embed-expression:
             __has_embed ( header-name pp-balanced-token-seqopt )
             __has_embed ( header-name-tokens pp-balanced-token-seqopt )
    
  3. Change in 15.2 [cpp.cond] paragraph 3 and paragraph 4 as follows:

    The second form of has-include-expression is considered only if the first form does not match, in which case the preprocessing tokens are processed just as in normal text.

    The header or source file identified by the parenthesized preprocessing token sequence in each contained has-include-expression is searched for as if that preprocessing token sequence were the pp-tokens in of a #include directive, except that no further macro expansion is performed. If such a directive would not satisfy the syntactic requirements of a #include directive, the program is ill-formed. The has-include-expression evaluates to 1 if the search for the source file succeeds, and to 0 if the search fails.

  4. Change in 15.2 [cpp.cond] paragraph 5 as follows:

    The parenthesized pp-balanced-token-seq in preprocessing token sequence of each contained has-embed-expression is processed as if that pp-balanced-token-seq preprocessing token sequence were the pp-tokens in the third form of a #embed directive (15.4 [cpp.embed]), except that no further macro expansion is performed. If such a directive would not satisfy the syntactic requirements of a #embed directive, the program is ill-formed. ...
  5. Change in 15.3 [cpp.include] paragraph 1 through paragraph 4 as follows:

    A #include directive shall identify a header or source file that can be processed by the implementation.

    A header search for a sequence of characters searches a sequence of implementation-defined places for a header identified uniquely by that sequence of characters. How the places are specified or the header identified is implementation defined.

    A source file search for a sequence of characters attempts to identify a source file that is named by the sequence of characters. The named source file is searched for in an implementation-defined manner. If the implementation does not support a source file search for that sequence of characters, or if the search fails, the result of the source file search is the result of a header search for the same sequence of characters.

    A preprocessing directive of the form

      # include < h-char-sequence > header-name new-line
    
    causes the replacement of that directive by the entire contents of the header or source file identified by header-name.

    If the header-name is of the form

      < h-char-sequence >
    
    searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. How the places are specified or the header identified is implementation-defined a header is identified by a header search for the sequence of characters of the h-char-sequence.

    A preprocessing directive If the header-name is of the form

      # include " q-char-sequence " new-line
    
    causes the replacement of that directive by the entire contents of the source file identified by the specified sequence between the " delimiters. The named source file is searched for in an implementation-defined manner. If this search is not supported, or if the search fails, the directive is reprocessed as if it read
      # include < h-char-sequence > new-line
    
    with the identical contained sequence (including > characters, if any) from the original directive the source file or header is identified by a source file search for the sequence of characters of the q-char-sequence.

    If a header search fails, or if a source file search or header search identifies a header or source file that cannot be processed by the implementation, the program is ill-formed.

    A preprocessing directive of the form

      # include pp-tokens new-line
    
    (that does not match one of the two previous forms form) is permitted. The preprocessing tokens after include in the directive are processed just as in normal text (i.e., each identifier currently defined as a macro name is replaced by its replacement list of preprocessing tokens). Then, an attempt is made to form a header-name preprocessing token (5.6 [lex.header]) from the characters of the spellings of the resulting sequence of preprocessing tokens; the treatment of whitespace is implementation-defined. If the attempt succeeds, the directive with the so-formed header-name is processed as specified for the previous form. Otherwise directive resulting after all replacements does not match one of the two previous forms, the behavior is undefined.

    [Note 1: Adjacent string-literals are not concatenated into a single string-literal (see the translation phases in 5.2 [lex.phases]); thus, an expansion that results in two string-literals is an invalid directive. —end note]

    The method by which a sequence of preprocessing tokens between a < and a > preprocessing token pair or a pair of " characters is combined into a single header name preprocessing token is implementation-defined.

  6. Change in 15.4.1 [cpp.embed.gen] paragraph 1 and paragraph 2 as follows:

    A bracket resource search for a sequence of characters searches a sequence of implementation-defined places for a resource identified uniquely by that sequence of characters. How the places are specified or the header identified is implementation defined.

    A quote resource search for a sequence of characters attempts to identify a resource that is named by the sequence of characters. The named resource is searched for in an implementation-defined manner. If the implementation does not support a quote resource search for that sequence of characters, or if the search fails, the result of the quote resource search is the result of a bracket resource search for the same sequence of characters.

    A preprocessing directive of the form

      # embed < h-char-sequence > header-name pp-tokensopt new-line
    
    causes the replacement of that directive by data from the resource identified by header-name.

    If the header-name is of the form

      < h-char-sequence >
    
    searches a sequence of implementation-defined places for a resource identified uniquely by the specified sequence between the < and > delimiters. How the places are specified or the resource identified is implementation-defined the resource is identified by a bracket resource search for the sequence of characters of the h-char-sequence.

    A preprocessing directive If the header-name is of the form

      # embed " q-char-sequence " pp-tokensopt new-line
    
    searches for a resource identified by the specified sequence between the " delimiters. The named resource is searched for in an implementation-defined manner. If this search is not supported, or if the search fails, the directive is reprocessed as if it read
      # embed < h-char-sequence > pp-tokensopt new-line
    
    with the identical contained sequence (including > characters, if any) from the original directive. the resource is identified by a quote resource search for the sequence of characters of the q-char-sequence.

    If a bracket resource search fails, or if a quote or bracket resource search identifies a resource that cannot be processed by the implementation, the program is ill-formed.

  7. Change in 15.4.1 [cpp.embed.gen] paragraph 10 and paragraph 11:

    (10) A preprocessing directive of the form
      # embed pp-tokens new-line
    
    (that does not match one of the two previous forms form) is permitted. The preprocessing tokens after embed in the directive are processed just as in normal text (i.e., each identifier currently defined as a macro name is replaced by its replacement list of preprocessing tokens). The directive resulting after all replacements of the third form shall match one of the two previous forms Then, an attempt is made to form a header-name preprocessing token (5.6 [lex.header]) from the characters of the spellings of the resulting sequence of preprocessing tokens immediately after embed; the treatment of whitespace is implementation-defined. If the attempt succeeds, the directive with the so-formed header-name is processed as specified for the previous form. Otherwise, the program is ill-formed.

    [Note 1: Adjacent string-literals are not concatenated into a single string-literal (see the translation phases in (5.2 [lex.phases])); thus, an expansion that results in two string-literals is an invalid directive. —end note]

    Any further processing as in normal text described for the two previous forms form is not performed. [Note 2: That is, processing as in normal text happens once and only once for the entire directive. —end note]

    (11) [Example 4: If the directive matches the third second form, the whole directive is replaced. If the directive matches the first two forms form, everything after the name is replaced.

      #define prefix(ARG) suffix(ARG)
      #define THE_ADDITION "teehee"
      #define THE_RESOURCE ":3c"
      #embed ":3c"        prefix(THE_ADDITION)
      #embed THE_RESOURCE prefix(THE_ADDITION)
    
      #define EMPTY
      #define X myfile
      #define Y rsc
      #define Z 42
      #embed <myfile.rsc> prefix(Z)
      #embed EMPTY <X.Y>  prefix(Z)
    
    is equivalent to:
      #embed ":3c" suffix("teehee")
      #embed ":3c" suffix("teehee")
    
      #embed <myfile.rsc> prefix(42)
      #embed <myfile.rsc> prefix(42)
    
    end example]

    The method by which a sequence of preprocessing tokens between a < and a > preprocessing token pair or a pair of " characters is combined into a single resource name preprocessing token is implementation-defined.

  8. Change in 15.7.3 [cpp.stringize] paragraph 2 as follows:

    ... Otherwise, the original spelling of each preprocessing token in the stringizing argument is retained in the character string literal, except for special handling for producing the spelling of header-names, string-literals, and character-literals: a \ character is inserted before each " and \ character of a header-name, character-literal, or string-literal (including the delimiting " characters). If the replacement that results is not a valid character string literal, the behavior is undefined. ...
History
Date User Action Args
2025-03-23 00:00:00admincreate