Title
Pattern used by std::from_chars is underspecified
Status
new
Section
[charconv.from.chars]
Submitter
Jonathan Wakely

Created on 2020-06-23.00:00:00 last changed 2 months ago

Messages

Date: 2020-07-17.22:37:26

Proposed resolution:

This wording is relative to N4861.

Option A:
  1. Modify [charconv.from.chars] as indicated:

    from_chars_result from_chars(const char* first, const char* last, float& value,
                                 chars_format fmt = chars_format::general);
    from_chars_result from_chars(const char* first, const char* last, double& value,
                                 chars_format fmt = chars_format::general);
    from_chars_result from_chars(const char* first, const char* last, long double& value,
                                 chars_format fmt = chars_format::general);
    

    -6- Preconditions: fmt has the value of one of the enumerators of chars_format.

    -7- Effects: The pattern is the expected form of the subject sequence in the "C" locale, as described for strtod, except that

    1. (7.1) — the sign '+' may only appear in the exponent part;

    2. (7.2) — if fmt has chars_format::scientific set but not chars_format::fixed, the otherwise optional exponent part shall appearexponent part is not optional;

    3. (7.3) — if fmt has chars_format::fixed set but not chars_format::scientific, the optional exponent part shall not appear; andthere is no exponent part;

    4. (?.?) — if fmt is not chars_format::hex, only decimal digits and an optional '.' appear before the exponent part (if any); and

    5. (7.4) — if fmt is chars_format::hex, the prefix "0x" or "0X" is assumed. [Example: The string 0x123 is parsed to have the value 0 with remaining characters x123. — end example]

    In any case, the resulting value is one of at most two floating-point values closest to the value of the string matching the pattern.

Option B:
  1. Modify [charconv.from.chars] as indicated:

    from_chars_result from_chars(const char* first, const char* last, float& value,
                                 chars_format fmt = chars_format::general);
    from_chars_result from_chars(const char* first, const char* last, double& value,
                                 chars_format fmt = chars_format::general);
    from_chars_result from_chars(const char* first, const char* last, long double& value,
                                 chars_format fmt = chars_format::general);
    

    -6- Preconditions: fmt has the value of one of the enumerators of chars_format.

    -7- Effects: The pattern is the expected form of the subject sequence in the "C" locale, as described for strtod, except thatThe pattern is an optional '-' sign followed by one of:

    1. (7.1) — the sign '+' may only appear in the exponent partINF or INFINITY, ignoring case;

    2. (7.2) — if fmt has chars_format::scientific set but not chars_format::fixed, the otherwise optional exponent part shall appearif numeric_limits<T>::has_quiet_NaN is true, NAN or NAN(n-char-sequenceopt), ignoring case in the NAN part, where:

      n-char-sequence:
             digit
             nondigit
             n-char-sequence digit
             n-char-sequence nondigit
      

      ;

    3. (7.3) — if fmt has chars_format::fixed set but not chars_format::scientific, the optional exponent part shall not appear; andif fmt is equal to chars_format::scientific, a sequence of characters matching chars-format-dec exponent-part, where:

      chars-format-dec:
               fractional-constant
               digit-sequence
      

      ;

    4. (7.4) — if fmt is chars_format::hex, the prefix "0x" or "0X" is assumed. [Example: The string 0x123 is parsed to have the value 0 with remaining characters x123. — end example]if fmt is equal to chars_format::fixed, a sequence of characters matching chars-format-dec;

    5. (?.?) — if fmt is equal to chars_format::general, a sequence of characters matching chars-format-dec exponent-partopt; or

    6. (?.?) — if fmt is equal to chars_format::hex, a sequence of characters matching chars-format-hex binary-exponent-partopt, where:

      chars-format-hex:
               hexadecimal-fractional-constant
               hexadecimal-digit-sequence
      

      [Note: The pattern is derived from the subject sequence in the "C" locale for strtod, with the value of fmt limiting which forms of the subject sequence are recognized, and with no 0x or 0X prefix recognized. — end note]

    For a character sequence INF, INFINITY, NAN, or NAN(n-char-sequenceopt) the resulting value is obtained as if by evaluating strtod(string(first, last).c_str(), nullptr) in the "C" locale. In all other casesIn any case, the resulting value is one of at most two floating-point values closest to the value of the string matching the pattern.

Date: 2020-07-15.00:00:00

[ 2020-07-17; Priority set to 3 in telecon ]

Date: 2020-07-15.00:00:00

[ 2020-07-14; Jonathan fixes the strtod call in Option B ]

Date: 2020-06-29.16:06:05

The intention of [charconv.from.chars] p7 is that the fmt argument modifies the expected pattern, so that only a specific subset of valid strtod patterns are recognized for each format. This is not clear from the wording.

When fmt == chars_format::fixed no exponent is to be used, so any trailing characters that match the form of a strtod exponent are ignored. For example, "1.23e4" should produce the result 1.23 for the fixed format. The current wording says "the optional exponent part shall not appear" which can be interpreted to mean that "1.23e4" violates a precondition and so has undefined behaviour!

When fmt != chars_format::hex only decimal numbers should be recognized. This means that for any format except scientific, "0x123" produces 0.0 (it's invalid when fmt == chars_format::scientific because there's no exponent). The current wording only says that when hex is used the string has an assumed "0x" prefix, so is interpreted as a hexadecimal float, it doesn't say that when fmt != hex that the string is not interpreted as a hexadecimal float.

Two alternative resolutions are provided, one is a minimal fix and the other attempts to make it clearer by not referring to a modified version of the C rules.

History
Date User Action Args
2020-07-17 22:37:26adminsetmessages: + msg11388
2020-07-14 16:33:08adminsetmessages: + msg11370
2020-06-26 13:10:02adminsetmessages: + msg11348
2020-06-23 00:00:00admincreate