Issue 662: Inconsistent handling of incorrectly-placed thousands separators

Title: Inconsistent handling of incorrectly-placed thousands separators
Status: nad
Section: [facet.num.get.virtuals]
Submitter: Cosmin Truta

Created on 2007-04-05.00:00:00 last changed 179 months ago

Messages

Date: 2011-03-05.00:04:13

Rationale:

post-Toronto: Changed from New to NAD at the request of the author. The preferred solution of N2327 makes this resolution obsolete.

msg3364 (view)

Date: 2010-10-21.18:28:33

Proposed resolution:

Change [facet.num.get.virtuals]:

Stage 3: The result of stage 2 processing can be one of

A sequence of chars has been accumulated in stage 2 that is converted (according to the rules of scanf) to a value of the type of val. ~~This value is stored in val and ios_base::goodbit is stored in err.~~

The sequence of chars accumulated in stage 2 would have caused scanf to report an input failure. ios_base::failbit is assigned to err.

In the first case, Ddigit grouping is checked. That is, the positions of discarded separators is examined for consistency with use_facet<numpunct<charT> >(loc).grouping(). If they are not consistent then ios_base::failbit is assigned to err. Otherwise, the value that was converted in stage 2 is stored in val and ios_base::goodbit is stored in err.

msg3363 (view)

Date: 2007-04-05.00:00:00

From Section [facet.num.get.virtuals], paragraphs 11 and 12, it is implied that the value read from a stream must be stored even if the placement of thousands separators does not conform to the grouping() specification from the numpunct facet. Since incorrectly-placed thousands separators are flagged as an extraction failure (by the means of failbit), we believe it is better not to store the value. A consistent strategy, in which any kind of extraction failure leaves the input item intact, is conceptually cleaner, is able to avoid corner-case traps, and is also more understandable from the programmer's point of view.

Here is a quote from "The C++ Programming Language (Special Edition)" by B. Stroustrup (Section D.4.2.3, pg. 897):

"If a value of the desired type could not be read, failbit is set in r. [...] An input operator will use r to determine how to set the state of its stream. If no error was encountered, the value read is assigned through v; otherwise, v is left unchanged."

This statement implies that rdstate() alone is sufficient to determine whether an extracted value is to be assigned to the input item val passed to do_get. However, this is in disagreement with the current C++ Standard. The above-mentioned assumption is true in all cases, except when there are mismatches in digit grouping. In the latter case, the parsed value is assigned to val, and, at the same time, err is assigned to ios_base::failbit (essentially "lying" about the success of the operation). Is this intentional? The current behavior raises both consistency and usability concerns.

Although digit grouping is outside the scope of scanf (on which the virtual methods of num_get are based), handling of grouping should be consistent with the overall behavior of scanf. The specification of scanf makes a distinction between input failures and matching failures, and yet both kinds of failures have no effect on the input items passed to scanf. A mismatch in digit grouping logically falls in the category of matching failures, and it would be more consistent, and less surprising to the user, to leave the input item intact whenever a failure is being signaled.

The extraction of bool is another example outside the scope of scanf, and yet consistent, even in the event of a successful extraction of a long but a failed conversion from long to bool.

Inconsistency is further aggravated by the fact that, when failbit is set, subsequent extraction operations are no-ops until failbit is explicitly cleared. Assuming that there is no explicit handling of rdstate() (as in cin>>i>>j) it is counter-intuitive to be able to extract an integer with mismatched digit grouping, but to be unable to extract another, properly-formatted integer that immediately follows.

Moreover, setting failbit, and selectively assigning a value to the input item, raises usability problems. Either the strategy of scanf (when there is no extracted value in case of failure), or the strategy of the strtol family (when there is always an extracted value, and there are well-defined defaults in case of a failure) are easy to understand and easy to use. On the other hand, if failbit alone cannot consistently make a difference between a failed extraction, and a successful but not-quite-correct extraction whose output happens to be the same as the previous value, the programmer must resort to implementation tricks. Consider the following example:

    int i = old_i;
    cin >> i;
    if (cin.fail())
        // can the value of i be trusted?
        // what does it mean if i == old_i?
        // ...

Last but not least, the current behvaior is not only confusing to the casual reader, but it has also been confusing to some book authors. Besides Stroustrup's book, other books (e.g. "Standard C++ IOStreams and Locales" by Langer and Kreft) are describing the same mistaken assumption. Although books are not to be used instead of the standard reference, the readers of these books, as well as the people who are generally familiar to scanf, are even more likely to misinterpret the standard, and expect the input items to remain intact when a failure occurs.

History
Date	User	Action	Args
2010-10-21 18:28:33	admin	set	messages: + msg3365
2010-10-21 18:28:33	admin	set	messages: + msg3364
2007-04-05 00:00:00	admin	create