Title
istream >> char and eofbit
Status
nad
Section
[istream]
Submitter
Howard Hinnant

Created on 2011-02-27.00:00:00 last changed 164 months ago

Messages

Date: 2011-06-02.15:28:39

Rationale:

Reading the last character does not set eofbit and the standard says so already

Date: 2011-03-24.00:00:00

[ 2011-03-24 Madrid meeting ]

Dietmar convinced Howard, that the standard does already say the right words

Date: 2011-02-28.00:00:00

[ 2011-02-28: Martin Sebor comments ]

[Responds to bullet 1 of Jean-Marc's list]

Yes, this matches the stdcxx test suite for num_get and time_get but not money_get when the currency symbol is last. I don't see where in the locale.money.get.virtuals section we specify whether eofbit is or isn't set and when.

IMO, if we try to fix the char extractor to be consistent we should also fix all the others extractors and manipulators that aren't consistent (including std::get_money and std::get_time).

Date: 2011-02-27.00:00:00

[ 2011-02-27: Jean-Marc Bourguet comments ]

Just for completeness: it [the counter example] doesn't inhibit to read the next line, it inhibits the prompt to be put at the appropriate time.

More information to take into account when deciding:

  • if I'm reading correctly the section to get boolean values when boolalpha is set, there we mandate that eof isn't set if trying to read past the end of the pending sequence wasn't needed to determine the result.

  • see also the behaviour of getline (which isn't a formatted input function but won't set eof if it occurs just after the delimiter)

  • if I'm reading the C standard correctly scanf("%c") wouldn't set feof either in that situation.

Date: 2011-02-27.00:00:00

The question is: When a single character is extracted from an istream using operator>>, does eofbit get set if this is the last character extracted from the stream? The current standard is at best ambiguous on the subject. [istream]/p3 describes all extraction operations with:

3 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_base::failure ([iostate.flags]), before returning.

And [istream::extractors]/p12 in describing operator>>(basic_istream<charT,traits>& in, charT& c); offers no further clarification:

12 Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of in. After a sentry object is constructed a character is extracted from in, if one is available, and stored in c. Otherwise, the function calls in.setstate(failbit).

I coded it one way in libc++, and g++ coded it another way. Chris Jefferson noted that some boost code was sensitive to the difference and fails for libc++. Therefore I believe that it is very important that we specify this extraction operator in enough detail that both vendors and clients know what behavior is required and expected.

Here is a brief code example demonstrating the issue:

#include <sstream>
#include <cassert>

int main()
{
  std::istringstream ss("1");
  char t;
  ss >> t;
  assert(!ss.eof());
};

For every type capable of reading this istringstream but char, ss.eof() will be true after the extraction (bool, int, double, etc.). So for consistency's sake we might want to have char behave the same way as other built-in types.

However Jean-Marc Bourguet offers this counter example code using an interactive stream. He argues that setting eof inhibits reading the next line:

#include <iostream>

int main()
{
 char c;
 std::cin >> std::noskipws;
 std::cout << "First line: ";
 while (std::cin >> c) {
    if (c == '\n') {
       std::cout << "Next line: ";
    }
 }
}

As these two code examples demonstrate, whether or not eofbit gets set is an observable difference and it is impacting real-world code. I feel it is critical that we clearly and unambiguously choose one behavior or the other. I am proposing wording for both behaviors and ask the LWG to choose one (and only one!).

Wording for setting eof bit:

Modify [istream::extractors]/p12 as follows:

12 Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of in. After a sentry object is constructed a character is extracted from in, if one is available, and stored in c. Otherwise, the function calls in.setstate(failbit). If a character is extracted and it is the last character in the pending sequence, the function calls in.setstate(eofbit). If a character is not extracted the function calls in.setstate(failbit | eofbit).

Wording for not setting eof bit:

12 Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of in. After a sentry object is constructed a character is extracted from in, if one is available, and stored in c. Otherwise, the function calls in.setstate(failbit). with in.rdbuf()->sbumpc(). If traits::eof() is returned, the function calls in.setstate(failbit | eofbit). Otherwise the return value is converted to type charT and stored in c.

History
Date User Action Args
2011-06-02 15:28:39adminsetmessages: + msg5807
2011-03-24 16:58:37adminsetmessages: + msg5691
2011-03-24 16:58:37adminsetstatus: new -> nad
2011-03-01 18:58:09adminsetmessages: + msg5567
2011-03-01 18:58:09adminsetmessages: + msg5566
2011-02-27 00:00:00admincreate