Issue 1200: "surprising" char_traits<T>::int_type requirements

Title: "surprising" char_traits<T>::int_type requirements
Status: nad
Section: [char.traits.typedefs]
Submitter: Sean Hunt

Created on 2009-09-03.00:00:00 last changed 177 months ago

Messages

msg5444 (view)

Date: 2010-11-24.14:28:04

[ Moved to NAD at 2010-11 Batavia ]

msg4754 (view)

Date: 2010-10-21.19:00:35

[ 2010 Rapperswil: ]

This seems an overspecification, and it is not clear what problem is being solved - these values can be used portably by using the named functions; there is no need for the value itself to be portable. Move to Tentatively NAD.

msg1123 (view)

Date: 2009-10-28.00:00:00

[ 2009-10-28 Ganesh provides two possible resolutions and expresses a preference for the second: ]

Replace [char.traits.specializations.char16_t] para 3 with:

The member eof() shall return ~~an implementation-defined constant that cannot appear as a valid UTF-16 code unit~~ UINT_LEAST16_MAX [Note: this value is guaranteed to be a permanently reserved UCS-2 code position if UINT_LEAST16_MAX == 0xFFFF and it's not a UCS-2 code position otherwise — end note].

Replace [char.traits.specializations.char32_t] para 3 with:

The member eof() shall return ~~an implementation-defined constant that cannot appear as a Unicode code point~~ UINT_LEAST32_MAX [Note: this value is guaranteed to be a permanently reserved UCS-4 code position if UINT_LEAST32_MAX == 0xFFFFFFFF and it's not a UCS-4 code position otherwise — end note].
In [char.traits.specializations.char16_t], in the definition of char_traits<char16_t> replace the definition of nested typedef int_type with:
namespace std {
  template<> struct char_traits<char16_t> {
    typedef char16_t         char_type;
    typedef uint_least16_t uint_fast16_t int_type;
     ...
Replace [char.traits.specializations.char16_t] para 3 with:

The member eof() shall return ~~an implementation-defined constant that cannot appear as a valid UTF-16 code unit~~ UINT_FAST16_MAX [Note: this value is guaranteed to be a permanently reserved UCS-2 code position if UINT_FAST16_MAX == 0xFFFF and it's not a UCS-2 code position otherwise — end note].

In [char.traits.specializations.char32_t], in the definition of char_traits<char32_t> replace the definition of nested typedef int_type with:
namespace std {
  template<> struct char_traits<char32_t> {
    typedef char32_t         char_type;
    typedef uint_least32_t uint_fast32_t int_type;
     ...
Replace [char.traits.specializations.char32_t] para 3 with:

The member eof() shall return ~~an implementation-defined constant that cannot appear as a Unicode code point~~ UINT_FAST32_MAX [Note: this value is guaranteed to be a permanently reserved UCS-4 code position if UINT_FAST32_MAX == 0xFFFFFFFF and it's not a UCS-4 code position otherwise — end note].

msg1122 (view)

Date: 2009-09-03.00:00:00

The footnote for int_type in [char.traits.typedefs] says that

If eof() can be held in char_type then some iostreams implementations may give surprising results.

This implies that int_type should be a superset of char_type. However, the requirements for char16_t and char32_t define int_type to be equal to int_least16_t and int_least32_t respectively. int_least16_t is likely to be the same size as char_16_t, which may lead to surprising behavior, even if eof() is not a valid UTF-16 code unit. The standard should not prescribe surprising behavior, especially without saying what it is (it's apparently not undefined, just surprising). The same applies for 32-bit types.

I personally recommend that behavior be undefined if eof() is a member of char_type, and another type be chosen for int_type (my personal favorite has always been a struct {bool eof; char_type c;}). Alternatively, the exact results of such a situation should be defined, at least so far that I/O could be conducted on these types as long as the code units remain valid. Note that the argument that no one streams char16_t or char32_t is not really valid as it would be perfectly reasonable to use a basic_stringstream in conjunction with UTF character types.

History
Date	User	Action	Args
2010-11-24 14:28:04	admin	set	messages: + msg5444
2010-10-21 19:00:35	admin	set	messages: + msg4754
2010-10-21 19:00:35	admin	set	status: new -> nad
2010-10-21 18:28:33	admin	set	messages: + msg1123
2009-09-03 00:00:00	admin	create