Issue 238: Precision and accuracy constraints on floating point

Title: Precision and accuracy constraints on floating point
Status: cd4
Section: Clause [7] [expr]
Submitter: Christophe de Dinechin

Created on 2000-07-31.00:00:00 last changed 103 months ago

Messages

Date: 2015-09-15.00:00:00

Proposed resolution (September, 2015):

Change 6.8.2 [basic.fundamental] paragraph 8 as follows:

There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined. [Note: This International Standard imposes no requirements on the accuracy of floating-point operations; see also 17.3 [support.limits]. —end note] Integral and floating types are collectively called arithmetic types. Specializations of the standard library template std::numeric_limits (17.3 [support.limits]) shall specify the maximum and minimum values of each arithmetic type for an implementation.

msg382 (view)

Date: 2016-02-15.00:00:00

[Adopted at the February, 2016 meeting.]

It is not clear what constraints are placed on a floating point implementation by the wording of the Standard. For instance, is an implementation permitted to generate a "fused multiply-add" instruction if the result would be different from what would be obtained by performing the operations separately? To what extent does the "as-if" rule allow the kinds of optimizations (e.g., loop unrolling) performed by FORTRAN compilers?

History
Date	User	Action	Args
2017-02-06 00:00:00	admin	set	status: tentatively ready -> cd4
2015-11-10 00:00:00	admin	set	messages: + msg5593
2015-11-10 00:00:00	admin	set	status: open -> tentatively ready
2000-07-31 00:00:00	admin	create