Issue 2381: Inconsistency in parsing floating point numbers

Title: Inconsistency in parsing floating point numbers
Status: c++23
Section: [facet.num.get.virtuals]
Submitter: Marshall Clow

Created on 2014-04-30.00:00:00 last changed 19 months ago

Messages

msg12108 (view)

Date: 2021-10-14.09:56:08

Proposed resolution:

This wording is relative to N4885.

Change [facet.num.get.virtuals]/3 Stage 2 as indicated:
— Stage 2:
If in == end then stage 2 terminates. Otherwise a charT is taken from in and local variables are initialized as if by
```
char_type ct = *in;
char c = src[find(atoms, atoms + sizeof(src) - 1, ct) - atoms];
if (ct == use_facet<numpunct<charT>>(loc).decimal_point())
c = '.';
bool discard =
  ct == use_facet<numpunct<charT>>(loc).thousands_sep()
  && use_facet<numpunct<charT>>(loc).grouping().length() != 0;
```
where the values src and atoms are defined as if by:
```
static const char src[] = "0123456789abcdefpxABCDEFPX+-";
char_type atoms[sizeof(src)];
use_facet<ctype<charT>>(loc).widen(src, src + sizeof(src), atoms);
```
for this value of loc.
If discard is true, then if '.' has not yet been accumulated, then the position of the character is remembered, but the character is otherwise ignored. Otherwise, if '.' has already been accumulated, the character is discarded and Stage 2 terminates. If it is not discarded, then a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1. If so, it is accumulated.
If the character is either discarded or accumulated then in is advanced by ++in and processing returns to the beginning of stage 2.
[Example:
Given an input sequence of "0x1a.bp+07p",
- if the conversion specifier returned by Stage 1 is %d, "0" is accumulated;
- if the conversion specifier returned by Stage 1 is %i, "0x1a" are accumulated;
- if the conversion specifier returned by Stage 1 is %g, "0x1a.bp+07" are accumulated.
In all cases, the remainder is left in the input.
— end example]
Add the following new subclause to [diff.cpp03]:

C.4.? [locale]: localization library [diff.cpp03.locale]
Affected subclause: [facet.num.get.virtuals]
Change: The num_get facet recognizes hexadecimal floating point values.
Rationale: Required by new feature.
Effect on original feature: Valid C++2003 code may have different behavior in this revision of C++.

msg12055 (view)

Date: 2021-10-14.00:00:00

[ 2021-10-14 Approved at October 2021 virtual plenary. Status changed: Voting → WP. ]

msg11819 (view)

Date: 2021-09-15.00:00:00

[ 2021-09-20; Reflector poll ]

Set status to Tentatively Ready after eight votes in favour during reflector poll.

msg10112 (view)

Date: 2021-05-18.00:00:00

[ 2021-05-18 Tim updates wording ]

Based on the git history, libc++ appears to have always included p and P in src.

msg8504 (view)

Date: 2018-08-23.00:00:00

[ 2018-08-23 Batavia Issues processing ]

Needs an Annex C entry. Tim to write Annex C.

Previous resolution [SUPERSEDED]:

This wording is relative to N4606.

Change [facet.num.get.virtuals]/3 Stage 2 as indicated:

static const char src[] = "0123456789abcdefpxABCDEFPX+-";

Append the following examples to [facet.num.get.virtuals]/3 Stage 2 as indicated:

[Example:

Given an input sequence of "0x1a.bp+07p",

if Stage 1 returns %d, "0" is accumulated;

if Stage 1 returns %i, "0x1a" are accumulated;

if Stage 1 returns %g, "0x1a.bp+07" are accumulated.

In all cases, leaving the rest in the input.

— end example]

msg8356 (view)

Date: 2016-09-15.00:00:00

[ 2016-09-08, Zhihao Yuan comments and updates proposed wording ]

Examples added.

msg8320 (view)

Date: 2016-08-03.12:32:27

[ 2016-08, Chicago ]

Tues PM: Move to Open

msg8319 (view)

Date: 2016-08-02.20:06:29

[ 2016-08, Chicago ]

Zhihao provides wording

The src array in Stage 2 does narrowing only. The actual input validation is delegated to strtold (independent from the parsing in Stage 3 which is again being delegated to strtold) by saying:

[...] If it is not discarded, then a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1.

So a conforming C++11 num_get is supposed to magically accept an hexfloat without an exponent

0x3.AB

because we refers to C99, and the fix to this issue should be just expanding the src array.

Support for Infs and NaNs are not proposed because of the complexity of nan(n-chars).

msg8058 (view)

Date: 2016-04-16.04:56:37

[ 2016-04, Issues Telecon ]

People are much more interested in round-tripping hex floats than handling inf and nan. Priority changed to P2.

Marshall says he'll try to write some wording, noting that this is a very closely specified part of the standard, and has remained unchanged for a long time. Also, there will need to be a sample implementation.

msg6940 (view)

Date: 2014-04-30.00:00:00

In [facet.num.get.virtuals] we have:

Stage 3: The sequence of chars accumulated in stage 2 (the field) is converted to a numeric value by the rules of one of the functions declared in the header <cstdlib>:

For a signed integer value, the function strtoll.

For an unsigned integer value, the function strtoull.

For a floating-point value, the function strtold.

This implies that for many cases, this routine should return true:

bool is_same(const char* p)
{
  std::string str{p};
  double val1 = std::strtod(str.c_str(), nullptr);
  std::stringstream ss(str);
  double val2;
  ss >> val2;
  return std::isinf(val1) == std::isinf(val2) &&                 // either they're both infinity
         std::isnan(val1) == std::isnan(val2) &&                 // or they're both NaN
         (std::isinf(val1) || std::isnan(val1) || val1 == val2); // or they're equal
}

and this is indeed true, for many strings:

assert(is_same("0"));
assert(is_same("1.0"));
assert(is_same("-1.0"));
assert(is_same("100.123"));
assert(is_same("1234.456e89"));

but not for others

assert(is_same("0xABp-4")); // hex float
assert(is_same("inf"));
assert(is_same("+inf"));
assert(is_same("-inf"));
assert(is_same("nan"));
assert(is_same("+nan"));
assert(is_same("-nan"));

assert(is_same("infinity"));
assert(is_same("+infinity"));
assert(is_same("-infinity"));

These are all strings that are correctly parsed by std::strtod, but not by the stream extraction operators. They contain characters that are deemed invalid in stage 2 of parsing.

If we're going to say that we're converting by the rules of strtold, then we should accept all the things that strtold accepts.

History
Date	User	Action	Args
2023-11-22 15:47:43	admin	set	status: wp -> c++23
2021-10-14 09:56:08	admin	set	messages: + msg12108
2021-10-14 09:56:08	admin	set	status: voting -> wp
2021-09-29 12:57:28	admin	set	status: ready -> voting
2021-09-20 11:24:27	admin	set	messages: + msg12055
2021-09-20 11:24:27	admin	set	status: open -> ready
2021-05-19 04:45:25	admin	set	messages: + msg11819
2018-08-24 13:31:33	admin	set	messages: + msg10112
2016-09-08 20:57:59	admin	set	messages: + msg8504
2016-08-03 12:32:27	admin	set	messages: + msg8356
2016-08-03 12:32:27	admin	set	status: new -> open
2016-08-02 16:35:01	admin	set	messages: + msg8320
2016-08-02 16:35:01	admin	set	messages: + msg8319
2016-04-16 04:56:37	admin	set	messages: + msg8058
2014-04-30 00:00:00	admin	create