Title
"surprising" char_traits<T>::int_type requirements
Status
nad
Section
[char.traits.typedefs]
Submitter
Sean Hunt

Created on 2009-09-03.00:00:00 last changed 170 months ago

Messages

Date: 2010-11-24.14:28:04

[ Moved to NAD at 2010-11 Batavia ]

Date: 2010-10-21.19:00:35

[ 2010 Rapperswil: ]

This seems an overspecification, and it is not clear what problem is being solved - these values can be used portably by using the named functions; there is no need for the value itself to be portable. Move to Tentatively NAD.

Date: 2009-10-28.00:00:00

[ 2009-10-28 Ganesh provides two possible resolutions and expresses a preference for the second: ]

  1. Replace [char.traits.specializations.char16_t] para 3 with:

    The member eof() shall return an implementation-defined constant that cannot appear as a valid UTF-16 code unit UINT_LEAST16_MAX [Note: this value is guaranteed to be a permanently reserved UCS-2 code position if UINT_LEAST16_MAX == 0xFFFF and it's not a UCS-2 code position otherwise — end note].

    Replace [char.traits.specializations.char32_t] para 3 with:

    The member eof() shall return an implementation-defined constant that cannot appear as a Unicode code point UINT_LEAST32_MAX [Note: this value is guaranteed to be a permanently reserved UCS-4 code position if UINT_LEAST32_MAX == 0xFFFFFFFF and it's not a UCS-4 code position otherwise — end note].

  2. In [char.traits.specializations.char16_t], in the definition of char_traits<char16_t> replace the definition of nested typedef int_type with:

    namespace std {
      template<> struct char_traits<char16_t> {
        typedef char16_t         char_type;
        typedef uint_least16_t uint_fast16_t int_type;
         ...
    

    Replace [char.traits.specializations.char16_t] para 3 with:

    The member eof() shall return an implementation-defined constant that cannot appear as a valid UTF-16 code unit UINT_FAST16_MAX [Note: this value is guaranteed to be a permanently reserved UCS-2 code position if UINT_FAST16_MAX == 0xFFFF and it's not a UCS-2 code position otherwise — end note].

    In [char.traits.specializations.char32_t], in the definition of char_traits<char32_t> replace the definition of nested typedef int_type with:

    namespace std {
      template<> struct char_traits<char32_t> {
        typedef char32_t         char_type;
        typedef uint_least32_t uint_fast32_t int_type;
         ...
    

    Replace [char.traits.specializations.char32_t] para 3 with:

    The member eof() shall return an implementation-defined constant that cannot appear as a Unicode code point UINT_FAST32_MAX [Note: this value is guaranteed to be a permanently reserved UCS-4 code position if UINT_FAST32_MAX == 0xFFFFFFFF and it's not a UCS-4 code position otherwise — end note].

Date: 2009-09-03.00:00:00

The footnote for int_type in [char.traits.typedefs] says that

If eof() can be held in char_type then some iostreams implementations may give surprising results.

This implies that int_type should be a superset of char_type. However, the requirements for char16_t and char32_t define int_type to be equal to int_least16_t and int_least32_t respectively. int_least16_t is likely to be the same size as char_16_t, which may lead to surprising behavior, even if eof() is not a valid UTF-16 code unit. The standard should not prescribe surprising behavior, especially without saying what it is (it's apparently not undefined, just surprising). The same applies for 32-bit types.

I personally recommend that behavior be undefined if eof() is a member of char_type, and another type be chosen for int_type (my personal favorite has always been a struct {bool eof; char_type c;}). Alternatively, the exact results of such a situation should be defined, at least so far that I/O could be conducted on these types as long as the code units remain valid. Note that the argument that no one streams char16_t or char32_t is not really valid as it would be perfectly reasonable to use a basic_stringstream in conjunction with UTF character types.

History
Date User Action Args
2010-11-24 14:28:04adminsetmessages: + msg5444
2010-10-21 19:00:35adminsetmessages: + msg4754
2010-10-21 19:00:35adminsetstatus: new -> nad
2010-10-21 18:28:33adminsetmessages: + msg1123
2009-09-03 00:00:00admincreate