Issue 2752: Excess-precision floating-point literals

Title: Excess-precision floating-point literals
Status: open
Section: 5.13.4 [lex.fcon]
Submitter: Peter Dimov

Created on 2023-06-29.00:00:00 last changed 24 months ago

Messages

msg7342 (view)

Date: 2023-06-15.00:00:00

Additional notes (June, 2023)

Forwarded to EWG via cplusplus/papers#1584, by decision of the CWG chair.

msg7341 (view)

Date: 2025-03-27.21:45:20

Consider:

  int main()
  {
    constexpr auto x = 3.14f;
    assert( x == 3.14f );         // can fail?
    static_assert( x == 3.14f );  // can fail?
  }

Can a conforming implementation represent a floating-point literal with excess precision, causing the comparisons to fail?

Subclause 5.13.4 [lex.fcon] paragraph 3 specifies:

If the scaled value is not in the range of representable values for its type, the program is ill-formed. Otherwise, the value of a floating-point-literal is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

This phrasing leaves little leeway for excess precision. In contrast, C23 specifies in section 6.4.4.3 paragraph 6:

The values of floating constants may be represented in greater range and precision than that required by the type (determined by the suffix); the types are not changed thereby. ...

Subclause 7.1 [expr.pre] paragraph 6 allows excess precision for floating-point computations (including their operands):

The values of the floating-point operands and the results of floating-point expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby. [ Footnote: The cast and assignment operators must still perform their specific conversions as described in 7.6.1.4 [expr.type.conv], 7.6.3 [expr.cast], 7.6.1.9 [expr.static.cast] and 7.6.19 [expr.assign]. -- end footnote ]

Taken together, that means that 314.f / 100.f can be computed and represented more precisely than 3.14f, which is hard to justify. The footnote appears to imply that (float)3.14f is required to yield a value with float precision, but that conversion (eventually) ends up at 9.5.1 [dcl.init.general] bullet 16.9:

...

Otherwise, the initial value of the object being initialized is the (possibly converted) value of the initializer expression. ...

If values produced from literals were permitted to carry excess precision, this phrasing does not seem to convery permission to discard excess precision when converting from a float value to type float ("... is the value..."), apparently requiring that the target object's value also carry the excess precision.

However, if initialization is intended to drop excess precision, then an overloaded operator returning float can never behave like a built-in operation with excess precision, because returning a value means initializing the return value.

The C++ standard library inherits the FLT_EVAL_METHOD macro from the C standard library. C23 specifies it as follows in section 5.2.5.3.3:

0 evaluate all operations and constants just to the range and precision of the type;

1 evaluate operations and constants of type float and double to the range and precision of the double type, evaluate long double operations and constants to the range and precision of the long double type;

2 evaluate all operations and constants to the range and precision of the long double type.

Taken together, a conforming C++ implementation cannot define FLT_EVAL_METHOD to 1 or 2, because literals (= "constants") cannot be represented with excess precision in C++.

History
Date	User	Action	Args
2023-06-30 06:11:34	admin	set	messages: + msg7342
2023-06-29 00:00:00	admin	create

0	evaluate all operations and constants just to the range and precision of the type;
1	evaluate operations and constants of type float and double to the range and precision of the double type, evaluate long double operations and constants to the range and precision of the long double type;
2	evaluate all operations and constants to the range and precision of the long double type.