Title
"ASCII" is not a registered character encoding
Status
wp
Section
[text.encoding.general]
Submitter
Jonathan Wakely

Created on 2024-01-23.00:00:00 last changed 3 weeks ago

Messages

Date: 2024-04-02.10:29:12

Proposed resolution:

This wording is relative to N4971.

  1. Modify [text.encoding.general] as indicated:

    -1- A registered character encoding is a character encoding scheme in the IANA Character Sets registry.

    [Note 1: The IANA Character Sets registry uses the term “character sets” to refer to character encodings. — end note]

    The primary name of a registered character encoding is the name of that encoding specified in the IANA Character Sets registry.

    -2- The set of known registered character encodings contains every registered character encoding specified in the IANA Character Sets registry except for the following:

    1. (2.1) – NATS-DANO (33)
    2. (2.2) – NATS-DANO-ADD (34)

    -3- Each known registered character encoding is identified by an enumerator in text_encoding::id, and has a set of zero or more aliases.

    -4- The set of aliases of a known registered character encoding is an implementation-defined superset of the aliases specified in the IANA Character Sets registry. The set of aliases for US-ASCII includes "ASCII". No two aliases or primary names of distinct registered character encodings are equivalent when compared by text_encoding::comp-name.

Date: 2024-04-02.10:29:12

[ Tokyo 2024-03-23; Status changed: Voting → WP. ]

Date: 2024-03-15.00:00:00

[ 2024-03-12; Reflector poll ]

SG16 approved the proposed resolution. Set status to Tentatively Ready after seven votes in favour during reflector poll.

Date: 2024-01-23.00:00:00

The IANA Charater Sets registry does not contain "ASCII" as an alias of the "US-ASCII" encoding. This is apparently for historical reasons, because there used to be some ambiguity about exactly what "ASCII" meant. I don't think those historical reasons are relevant to C++26, but the absence of "ASCII" in the IANA registry means that it's not a registered character encoding as defined by [text.encoding.general].

This means that the encoding referred to by notes in the C++ standard ([fs.path.generic], [facet.numpunct.virtuals]) and by an example in the std::text_encoding proposal (P1885) isn't actually usable in portable code. So std::text_encoding("ASCII") creates an object with mib() == std::text_encoding::other, which is not the same encoding as std::text_encoding("US-ASCII"). This seems surprising.

History
Date User Action Args
2024-04-02 10:29:12adminsetmessages: + msg14047
2024-04-02 10:29:12adminsetstatus: voting -> wp
2024-03-18 09:32:04adminsetstatus: ready -> voting
2024-03-12 01:10:06adminsetmessages: + msg13998
2024-03-12 01:10:06adminsetstatus: new -> ready
2024-01-23 13:57:27adminsetmessages: + msg13929
2024-01-23 00:00:00admincreate