Created on 2024-01-24.00:00:00 last changed yesterday
Proposed resolution:
This wording is relative to N4971.
Modify [ostream.formatted.print] as indicated:
void vprint_unicode(ostream& os, string_view fmt, format_args args); void vprint_nonunicode(ostream& os, string_view fmt, format_args args);-3- Effects: Behaves as a formatted output function ([ostream.formatted.reqmts]) of
os
, except that:
- (3.1) – failure to generate output is reported as specified below, and
- (3.2) – any exception thrown by the call to
vformat
is propagated without regard to the value ofos.exceptions()
and without turning onios_base::badbit
in the error state ofos
.-?- After constructing a
sentry
object, the function initializes an automatic variable viastring out = vformat(os.getloc(), fmt, args);
- (?.1) – If the function is
vprint_unicode
andos
is a stream that refers to a terminal that is only capable of displaying Unicode via a native Unicode API, which is determined in an implementation-defined manner, flushesos
and then writesout
to the terminal using the native Unicode API; ifout
contains invalid code units, the behavior is undefinedand implementations are encouraged to diagnose it.If the native Unicode API is used, the function flushes.os
before writingout
- (?.2) – Otherwise,
(ifinserts the character sequence [os
is not such a stream or the function isvprint_nonunicode
),out.begin()
,out.end()
) intoos
.-?- If writing to the terminal or inserting into
os
fails, callsos.setstate(ios_base::badbit)
(which may throwios_base::failure
).-4- Recommended practice: For
vprint_unicode
, if invoking the native Unicode API requires transcoding, implementations should substitute invalid code units with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion.
Modify [print.fun] as indicated:
void vprint_unicode(FILE* stream, string_view fmt, format_args args);-6- Preconditions:
stream
is a valid pointer to an output C stream.-7- Effects: The function initializes an automatic variable via
string out = vformat(fmt, args);
- (7.1) – If
stream
refers to a terminal that is only capable of displaying Unicode via a native Unicode API, flushesstream
and then writesout
to the terminal using the native Unicode API; ifout
contains invalid code units, the behavior is undefinedand implementations are encouraged to diagnose it.- (7.2) – Otherwise writes
out
tostream
unchanged.
If the native Unicode API is used, the function flushesstream
before writingout
.[Note 1: On
POSIX andWindows,the native Unicode API isWriteConsoleW
andstream
referring to a terminal means that, respectively,isatty(fileno(stream))
andGetConsoleMode(_get_osfhandle(_fileno(stream)), ...)
returns nonzero. — end note]
[Note 2: On Windows, the native Unicode API isWriteConsoleW
. — end note]-8- Throws: [...]
-9- Recommended practice: If invoking the native Unicode API requires transcoding, implementations should substitute invalid code units with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion.
[ St. Louis 2024-06-24; move to Ready. ]
[ 2024-03-19; Tokyo: Jonathan updates wording after LWG review ]
Split the Effects: into separate bullets for the "native Unicode API" and "otherwise" cases. Remove the now-redundant "if `os` is not such a stream" parenthesis.
[ 2024-03-12; Jonathan updates wording based on SG16 feedback ]
SG16 reviewed the issue and approved the proposed resolution with the wording about diagnosing invalid code units removed.
SG16 favors removing the following text (both occurrences) from the proposed wording. This is motivated by a lack of understanding regarding what it means to diagnose such invalid code unit sequences given that the input is likely provided at run-time.
If invoking the native Unicode API does not require transcoding, implementations are encouraged to diagnose invalid code units.
Some concern was expressed regarding how the current wording is structured. At present, the wording leads with a Windows centric perspective; if the stream refers to a terminal ... use the native Unicode API ... otherwise write code units to the stream. It might be an improvement to structure the wording such that use of the native Unicode API is presented as a fallback for implementations that require its use when writing directly to the stream is not sufficient to produce desired results. In other words, the wording should permit direct writing to the stream even when the stream is directed to a terminal and a native Unicode API is available when the implementation has reason to believe that doing so will produce the correct results. For example, Microsoft's HoloLens has a Windows based operating system, but it only supports use of UTF-8 as the system code page and therefore would not require the native Unicode API bypass; implementations for it could avoid the overhead of checking to see if the stream is directed to a console.
This wording is relative to N4971.
Modify [ostream.formatted.print] as indicated:
void vprint_unicode(ostream& os, string_view fmt, format_args args); void vprint_nonunicode(ostream& os, string_view fmt, format_args args);-3- Effects: Behaves as a formatted output function ([ostream.formatted.reqmts]) of
os
, except that:
- (3.1) – failure to generate output is reported as specified below, and
- (3.2) – any exception thrown by the call to
vformat
is propagated without regard to the value ofos.exceptions()
and without turning onios_base::badbit
in the error state ofos
.After constructing a
sentry
object, the function initializes an automatic variable viaIf the function isstring out = vformat(os.getloc(), fmt, args);
vprint_unicode
andos
is a stream that refers to a terminal that is only capable of displaying Unicode via a native Unicode API, which is determined in an implementation-defined manner, flushesos
and then writesout
to the terminal using the native Unicode API; ifout
contains invalid code units, the behavior is undefinedand implementations are encouraged to diagnose it.If the native Unicode API is used, the function flushesOtherwise, (ifos
before writingout
.os
is not such a stream or the function isvprint_nonunicode
), inserts the character sequence [out.begin()
,out.end()
) intoos
. If writing to the terminal or inserting intoos
fails, callsos.setstate(ios_base::badbit)
(which may throwios_base::failure
).-4- Recommended practice: For
vprint_unicode
, if invoking the native Unicode API requires transcoding, implementations should substitute invalid code units with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion.
Modify [print.fun] as indicated:
void vprint_unicode(FILE* stream, string_view fmt, format_args args);-6- Preconditions:
stream
is a valid pointer to an output C stream.-7- Effects: The function initializes an automatic variable via
Ifstring out = vformat(fmt, args);
stream
refers to a terminal that is only capable of displaying Unicode via a native Unicode API, flushesstream
and then writesout
to the terminal using the native Unicode API; ifout
contains invalid code units, the behavior is undefinedand implementations are encouraged to diagnose it. Otherwise writesout
tostream
unchanged.If the native Unicode API is used, the function flushesstream
before writingout
.[Note 1: On
POSIX andWindows,the native Unicode API isWriteConsoleW
andstream
referring to a terminal means that, respectively,isatty(fileno(stream))
andGetConsoleMode(_get_osfhandle(_fileno(stream)), ...)
return nonzero. — end note]
[Note 2: On Windows, the native Unicode API isWriteConsoleW
. — end note]-8- Throws: [...]
-9- Recommended practice: If invoking the native Unicode API requires transcoding, implementations should substitute invalid code units with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion.
[ 2024-03-12; Reflector poll ]
Set priority to 3 after reflector poll and send to SG16.
This wording is relative to N4971.
Modify [ostream.formatted.print] as indicated:
void vprint_unicode(ostream& os, string_view fmt, format_args args); void vprint_nonunicode(ostream& os, string_view fmt, format_args args);-3- Effects: Behaves as a formatted output function ([ostream.formatted.reqmts]) of
os
, except that:
- (3.1) – failure to generate output is reported as specified below, and
- (3.2) – any exception thrown by the call to
vformat
is propagated without regard to the value ofos.exceptions()
and without turning onios_base::badbit
in the error state ofos
.After constructing a
sentry
object, the function initializes an automatic variable viaIf the function isstring out = vformat(os.getloc(), fmt, args);
vprint_unicode
andos
is a stream that refers to a terminal capable of displaying Unicode via a native Unicode API, which is determined in an implementation-defined manner, flushesos
and then writesout
to the terminal using the native Unicode API; ifout
contains invalid code units, the behavior is undefinedand implementations are encouraged to diagnose it.If the native Unicode API is used, the function flushesOtherwise, (ifos
before writingout
.os
is not such a stream or the function isvprint_nonunicode
), inserts the character sequence [out.begin()
,out.end()
) intoos
. If writing to the terminal or inserting intoos
fails, callsos.setstate(ios_base::badbit)
(which may throwios_base::failure
).-4- Recommended practice: For
vprint_unicode
, if invoking the native Unicode API requires transcoding, implementations should substitute invalid code units with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion. If invoking the native Unicode API does not require transcoding, implementations are encouraged to diagnose invalid code units.
Modify [print.fun] as indicated:
void vprint_unicode(FILE* stream, string_view fmt, format_args args);-6- Preconditions:
stream
is a valid pointer to an output C stream.-7- Effects: The function initializes an automatic variable via
Ifstring out = vformat(fmt, args);
stream
refers to a terminal capable of displaying Unicode via a native Unicode API, flushesstream
and then writesout
to the terminal using the native Unicode API; ifout
contains invalid code units, the behavior is undefinedand implementations are encouraged to diagnose it. Otherwise writesout
tostream
unchanged.If the native Unicode API is used, the function flushesstream
before writingout
.[Note 1: On
POSIX andWindows,the native Unicode API isWriteConsoleW
andstream
referring to a terminal means that, respectively,isatty(fileno(stream))
andGetConsoleMode(_get_osfhandle(_fileno(stream)), ...)
return nonzero. — end note]
[Note 2: On Windows, the native Unicode API isWriteConsoleW
. — end note]-8- Throws: [...]
-9- Recommended practice: If invoking the native Unicode API requires transcoding, implementations should substitute invalid code units with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion. If invoking the native Unicode API does not require transcoding, implementations are encouraged to diagnose invalid code units.
The effects for vprintf_unicode
say:
If
stream
refers to a terminal capable of displaying Unicode, writesout
to the terminal using the native Unicode API; ifout
contains invalid code units, the behavior is undefined and implementations are encouraged to diagnose it. Otherwise writesout
tostream
unchanged. If the native Unicode API is used, the function flushesstream
before writingout
.[Note 1: On POSIX and Windows,
stream
referring to a terminal means that, respectively,isatty(fileno(stream))
andGetConsoleMode(_get_osfhandle(_fileno(stream)), ...)
return nonzero. — end note][Note 2: On Windows, the native Unicode API is
WriteConsoleW
. — end note]-8- Throws: [...]
-9- Recommended practice: If invoking the native Unicode API requires transcoding, implementations should substitute invalid code units with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion.
The very explicit mention of isatty
for POSIX platforms has
confused at least two implementers into thinking that we're supposed to
use isatty
, and supposed to do something differently based
on what it returns. That seems consistent with the nearly identical wording
in [format.string.std] paragraph 12, which says
"Implementations should use either UTF-8, UTF-16, or UTF-32,
on platforms capable of displaying Unicode text in a terminal"
and then has a note explicitly saying this is the case for Windows-based and
many POSIX-based operating systems. So it seems clear that POSIX platforms
are supposed to be considered to have "a terminal capable of displaying
Unicode text", and so std::print
should use isatty
and then use a native Unicode API, and diagnose invalid code units.
This is a problem however, because isatty
needs
to make a system call on Linux, adding 500ns to every std::print
call. This results in a 10x slowdown on Linux, where std::print
can take just 60ns without the isatty
check.
From discussions with Tom Honermann I learned that the "native Unicode API"
wording is only relevant on Windows. This makes sense, because for POSIX
platforms, writing to a terminal is done using the usual stdio functions,
so there's no need to treat a terminal differently to any other file stream.
And substitution of invalid code units with
u+fffd
is recommended for Windows because that's what typical modern terminals do on
POSIX platforms, so requiring the implementation to do that on Windows gives
consistent behaviour. But the implementation doesn't need to do anything to
make that happen with a POSIX terminal, it happens anyway.
So the isatty
check is unnecessary for POSIX platforms,
and the note mentioning it just causes confusion and has no benefit.
Secondly, there initially seems to be a contradiction between the "implementations are encouraged to diagnose it" wording and the later Recommended practice. In fact, there's no contradiction because the native Unicode API might accept UTF-8 and therefore require no transcoding, and so the Recommended practice wouldn't apply. The intention is that diagnosing invalid UTF-8 is still desirable in this case, but how should it be diagnosed? By writing an error to the terminal alongside the formatted string? Or by substituting u+fffd maybe? If the latter is the intention, why is one suggestion in the middle of the Effects, and one given as Recommended practice?
The proposed resolution attempts to clarify that a "native Unicode API" is only needed if that's how you display Unicode on the terminal. It also moves the flushing requirement to be adjacent to the other requirements for systems using a native Unicode API instead of on its own later in the paragraph. And the suggestion to diagnose invalid code units is moved into the Recommended practice and clarified that it's only relevant if using a native Unicode API. I'm still not entirely happy with encouragement to diagnose invalid code units without giving any clue as to how that should be done. What does it mean to diagnose something at runtime? That's novel for the C++ standard. The way it's currently phrased seems to imply something other than u+fffd substitution should be done, although that seems the most obvious implementation to me.
History | |||
---|---|---|---|
Date | User | Action | Args |
2024-11-19 16:09:07 | admin | set | status: ready -> voting |
2024-06-24 22:35:48 | admin | set | messages: + msg14213 |
2024-06-24 22:35:48 | admin | set | status: open -> ready |
2024-03-19 09:55:06 | admin | set | messages: + msg14022 |
2024-03-13 17:41:18 | admin | set | messages: + msg14007 |
2024-03-13 17:41:18 | admin | set | messages: + msg14006 |
2024-03-13 17:41:18 | admin | set | status: new -> open |
2024-01-24 20:15:39 | admin | set | messages: + msg13931 |
2024-01-24 00:00:00 | admin | create |