Title
Excess-precision floating-point literals
Status
open
Section
5.13.4 [lex.fcon]
Submitter
Peter Dimov

Created on 2023-06-29.00:00:00 last changed 10 months ago

Messages

Date: 2023-06-15.00:00:00

Additional notes (June, 2023)

Forwarded to EWG via cplusplus/papers#1584, by decision of the CWG chair.

Date: 2023-06-30.06:11:34

Consider:

  int main()
  {
    constexpr auto x = 3.14f;
    assert( x == 3.14f );         // can fail?
    static_assert( x == 3.14f );  // can fail?
  }

Can a conforming implementation represent a floating-point literal with excess precision, causing the comparisons to fail?

Subclause 5.13.4 [lex.fcon] paragraph 3 specifies:

If the scaled value is not in the range of representable values for its type, the program is ill-formed. Otherwise, the value of a floating-point-literal is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

This phrasing leaves little leeway for excess precision. In contrast, C23 (WG14 N3096) specifies in section 6.4.4.2 paragraph 6:

The values of floating constants may be represented in greater range and precision than that required by the type (determined by the suffix); the types are not changed thereby. ...

Subclause 7.1 [expr.pre] paragraph 6 allows excess precision for floating-point computations (including their operands):

The values of the floating-point operands and the results of floating-point expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby. [ Footnote: The cast and assignment operators must still perform their specific conversions as described in 7.6.1.4 [expr.type.conv], 7.6.3 [expr.cast], 7.6.1.9 [expr.static.cast] and 7.6.19 [expr.ass]. -- end footnote ]

Taken together, that means that 314.f / 100.f can be computed and represented more precisely than 3.14f, which is hard to justify. The footnote appears to imply that (float)3.14f is required to yield a value with float precision, but that conversion (eventually) ends up at 9.4.1 [dcl.init.general] bullet 16.9:

  • ...
  • Otherwise, the initial value of the object being initialized is the (possibly converted) value of the initializer expression. ...

This phrasing leaves no permission to discard excess precision when converting from a float value to type float ("... is the value...").

However, if initialization is intended to drop excess precision, then an overloaded operator returning float can never behave like a built-in operation with excess precision, because returning a value means initializing the return value.

The C++ standard library inherits the FLT_EVAL_METHOD macro from the C standard library. C23 (WG14 N3096) specifies it as follows in section 5.2.4.2.2:

0 evaluate all operations and constants just to the range and precision of the type;
1 evaluate operations and constants of type float and double to the range and precision of the double type, evaluate long double operations and constants to the range and precision of the long double type;
2 evaluate all operations and constants to the range and precision of the long double type.

Taken together, a conforming C++ implementation cannot define FLT_EVAL_METHOD to 1 or 2, because literals (= "constants") cannot be represented with excess precision in C++.

History
Date User Action Args
2023-06-30 06:11:34adminsetmessages: + msg7342
2023-06-29 00:00:00admincreate