Issue 3831: Two-digit formatting of negative year is ambiguous

Title: Two-digit formatting of negative year is ambiguous
Status: new
Section: [time.format][time.parse]
Submitter: Matt Stephanson

Created on 2022-11-18.00:00:00 last changed 31 months ago

Messages

Date: 2022-11-30.10:08:16

Proposed resolution:

This wording is relative to N4917.

[Drafting Note: Two mutually exclusive options are prepared, depicted below by Option A and Option B, respectively.]

Option A: This is Howard Hinnant's choice (3)

Modify [time.format], Table [tab:time.format.spec] as indicated:

Table 102 — Meaning of conversion specifiers [tab:time.format.spec]
Specifier Replacement

[…]

%y The ~~last two decimal digits of the year~~remainder after dividing the year by 100 using floored division.
If the result is a single digit it is prefixed by 0.
The modified command %Oy produces the locale's alternative representation. The
modified command %Ey produces the locale's alternative representation of offset from
%EC (year only).

[…]

Table 102 — Meaning of conversion specifiers [tab:time.format.spec]
Specifier	Replacement
`[…]`
`%y`	The ~~last two decimal digits of the year~~remainder after dividing the year by 100 using floored division. If the result is a single digit it is prefixed by `0`. The modified command `%Oy` produces the locale's alternative representation. The modified command `%Ey` produces the locale's alternative representation of offset from `%EC` (year only).
`[…]`

Modify [time.parse], Table [tab:time.parse.spec] as indicated:

Table 103 — Meaning of parse flags [tab:time.parse.spec]
Flag Parsed value

[…]

%y The ~~last two decimal digits of the year~~remainder after dividing the year by 100 using floored division.
If the century is not otherwise specified (e.g.
with %C), values in the range [69, 99] are presumed to refer to the years 1969 to 1999,
and values in the range [00, 68] are presumed to refer to the years 2000 to 2068. The
modified command %N y specifies the maximum number of characters to read. If N is
not specified, the default is 2. Leading zeroes are permitted but not required. The
modified commands %Ey and %Oy interpret the locale's alternative representation.

[…]

Table 103 — Meaning of `parse` flags [tab:time.parse.spec]
Flag	Parsed value
`[…]`
`%y`	The ~~last two decimal digits of the year~~remainder after dividing the year by 100 using floored division. If the century is not otherwise specified (e.g. with `%C`), values in the range [`69`, `99`] are presumed to refer to the years 1969 to 1999, and values in the range [`00`, `68`] are presumed to refer to the years 2000 to 2068. The modified command `%N y` specifies the maximum number of characters to read. If N is not specified, the default is 2. Leading zeroes are permitted but not required. The modified commands `%Ey` and `%Oy` interpret the locale's alternative representation.
`[…]`

Option B: This is Howard Hinnant's choice (1)

Modify [time.format], Table [tab:time.format.spec] as indicated:

Table 102 — Meaning of conversion specifiers [tab:time.format.spec]
Specifier Replacement

[…]

%y The last two decimal digits of the year, regardless of the sign of the year.
If the result is a single digit it is prefixed by 0.
The modified command %Oy produces the locale's alternative representation. The
modified command %Ey produces the locale's alternative representation of offset from
%EC (year only).
[Example ?: cout << format("{:%C %y}", -1976y); prints -20 76. — end example]

[…]

Table 102 — Meaning of conversion specifiers [tab:time.format.spec]
Specifier	Replacement
`[…]`
`%y`	The last two decimal digits of the year, regardless of the sign of the year. If the result is a single digit it is prefixed by `0`. The modified command `%Oy` produces the locale's alternative representation. The modified command `%Ey` produces the locale's alternative representation of offset from `%EC` (year only). [Example ?: `cout << format("{:%C %y}", -1976y);` prints `-20 76`. — end example]
`[…]`

Modify [time.parse], Table [tab:time.parse.spec] as indicated:

Table 103 — Meaning of parse flags [tab:time.parse.spec]
Flag Parsed value

[…]

%y The last two decimal digits of the year, regardless of the sign of the year.
If the century is not otherwise specified (e.g.
with %C), values in the range [69, 99] are presumed to refer to the years 1969 to 1999,
and values in the range [00, 68] are presumed to refer to the years 2000 to 2068. The
modified command %N y specifies the maximum number of characters to read. If N is
not specified, the default is 2. Leading zeroes are permitted but not required. The
modified commands %Ey and %Oy interpret the locale's alternative representation.
[Example ?: year y; istringstream{"-20 76"} >> parse("%3C %y", y); results in
y == -1976y. — end example]

[…]

Table 103 — Meaning of `parse` flags [tab:time.parse.spec]
Flag	Parsed value
`[…]`
`%y`	The last two decimal digits of the year, regardless of the sign of the year. If the century is not otherwise specified (e.g. with `%C`), values in the range [`69`, `99`] are presumed to refer to the years 1969 to 1999, and values in the range [`00`, `68`] are presumed to refer to the years 2000 to 2068. The modified command `%N y` specifies the maximum number of characters to read. If N is not specified, the default is 2. Leading zeroes are permitted but not required. The modified commands `%Ey` and `%Oy` interpret the locale's alternative representation. [Example ?: `year y; istringstream{"-20 76"} >> parse("%3C %y", y);` results in `y == -1976y`. — end example]
`[…]`

msg13106 (view)

Date: 2022-11-15.00:00:00

[ 2022-11-30; Reflector poll ]

Set priority to 3 after reflector poll.

A few votes for priority 2. Might need to go to LEWG.

msg13105 (view)

Date: 2022-11-18.00:00:00

An issue has been identified regarding the two-digit formatting of negative years according to Table [tab:time.format.spec] ([time.format]):

cout << format("{:%y} ", 1976y)  // "76"
     << format("{:%y}", -1976y); // also "76"?

The relevant wording is

The last two decimal digits of the year. If the result is a single digit it is prefixed by 0. The modified command %Oy produces the locale's alternative representation. The modified command %Ey produces the locale's alternative representation of offset from %EC (year only).

MSVC STL treats the regular modified form symmetrically. Just as %Ey is the offset from %EC, so %y is the offset from %C, which is itself "[t]he year divided by 100 using floored division." (emphasis added). Because -1976 is the 24th year of the -20th century, the above code will print "76 24" using MSVC STL. However, many users expect, and libc++ gives, a result based on the literal wording, "76 76".

IEEE 1003.1-2008 strftime expects the century to be nonnegative, but the glibc implementation prints 24 for -1976. My own opinion is that this is the better result, because it consistently interprets %C and %y as the quotient and remainder of floored division by 100.

Howard Hinnant, coauthor of the original [time.format] wording in P0355 adds:

On the motivation for this design it is important to remember a few things:

POSIX strftime/strptime doesn't handle negative years in this department, so this is an opportunity for an extension in functionality.

This is a formatting/parsing issue, as opposed to a computational issue. This means that human readability of the string syntax is the most important aspect. Computational simplicity takes a back seat (within reason).

%C can't be truncated division, otherwise the years [-99, -1] would map to the same century as the years [0, 99]. So floored division is a pretty easy and obvious solution.

%y is obvious for non-negative years: The last two decimal digits, or y % 100.

This leaves how to represent negative years with %y. I can think of 3 options:

Use the last two digits without negating: -1976 → 76.

Use the last two digits and negate it: -1976 → -76.

Use floored modulus arithmetic: -1976 → 24.

The algorithm to convert %C and %y into a year is not important to the client because these are both strings, not integers. The client will do it with parse, not 100*C + y.
I discounted solution 3 as not sufficiently obvious. If the output for -1976 was 23, the human reader wouldn't immediately know that this is off by 1. The reader is expecting the POSIX spec:

the last two digits of the year as a decimal number [00,99].

24 just doesn't cut it.
That leaves solution 1 or 2. I discounted solution 2 because having the negative in 2 places (the %C and %y) seemed overly complicated and more error prone. The negative sign need only be in one place, and it has to be in %C to prevent ambiguity.
That leaves solution 1. I believe this is the solution for an extension of the POSIX spec to negative years with the property of least surprise to the client. The only surprise is in %C, not %y, and the surprise in %C seems unavoidable.

History
Date	User	Action	Args
2022-11-30 10:08:16	admin	set	messages: + msg13127
2022-11-19 14:28:44	admin	set	messages: + msg13106
2022-11-18 00:00:00	admin	create