Title
Two-digit formatting of negative year is ambiguous
Status
new
Section
[time.format][time.parse]
Submitter
Matt Stephanson

Created on 2022-11-18.00:00:00 last changed 6 days ago

Messages

Date: 2022-11-30.10:08:16

Proposed resolution:

This wording is relative to N4917.

[Drafting Note: Two mutually exclusive options are prepared, depicted below by Option A and Option B, respectively.]

Option A: This is Howard Hinnant's choice (3)

  1. Modify [time.format], Table [tab:time.format.spec] as indicated:

    Table 102 — Meaning of conversion specifiers [tab:time.format.spec]
    Specifier Replacement
    […]
    %y The last two decimal digits of the yearremainder after dividing the year by 100 using floored division.
    If the result is a single digit it is prefixed by 0.
    The modified command %Oy produces the locale's alternative representation. The
    modified command %Ey produces the locale's alternative representation of offset from
    %EC (year only).
    […]
  2. Modify [time.parse], Table [tab:time.parse.spec] as indicated:

    Table 103 — Meaning of parse flags [tab:time.parse.spec]
    Flag Parsed value
    […]
    %y The last two decimal digits of the yearremainder after dividing the year by 100 using floored division.
    If the century is not otherwise specified (e.g.
    with %C), values in the range [69, 99] are presumed to refer to the years 1969 to 1999,
    and values in the range [00, 68] are presumed to refer to the years 2000 to 2068. The
    modified command %N y specifies the maximum number of characters to read. If N is
    not specified, the default is 2. Leading zeroes are permitted but not required. The
    modified commands %Ey and %Oy interpret the locale's alternative representation.
    […]

Option B: This is Howard Hinnant's choice (1)

  1. Modify [time.format], Table [tab:time.format.spec] as indicated:

    Table 102 — Meaning of conversion specifiers [tab:time.format.spec]
    Specifier Replacement
    […]
    %y The last two decimal digits of the year, regardless of the sign of the year.
    If the result is a single digit it is prefixed by 0.
    The modified command %Oy produces the locale's alternative representation. The
    modified command %Ey produces the locale's alternative representation of offset from
    %EC (year only).
    [Example ?: cout << format("{:%C %y}", -1976y); prints -20 76. — end example]
    […]
  2. Modify [time.parse], Table [tab:time.parse.spec] as indicated:

    Table 103 — Meaning of parse flags [tab:time.parse.spec]
    Flag Parsed value
    […]
    %y The last two decimal digits of the year, regardless of the sign of the year.
    If the century is not otherwise specified (e.g.
    with %C), values in the range [69, 99] are presumed to refer to the years 1969 to 1999,
    and values in the range [00, 68] are presumed to refer to the years 2000 to 2068. The
    modified command %N y specifies the maximum number of characters to read. If N is
    not specified, the default is 2. Leading zeroes are permitted but not required. The
    modified commands %Ey and %Oy interpret the locale's alternative representation.
    [Example ?: year y; istringstream{"-20 76"} >> parse("%3C %y", y); results in
    y == -1976y. — end example]
    […]
Date: 2022-11-15.00:00:00

[ 2022-11-30; Reflector poll ]

Set priority to 3 after reflector poll.

A few votes for priority 2. Might need to go to LEWG.

Date: 2022-11-18.00:00:00

An issue has been identified regarding the two-digit formatting of negative years according to Table [tab:time.format.spec] ([time.format]):

cout << format("{:%y} ", 1976y)  // "76"
     << format("{:%y}", -1976y); // also "76"?

The relevant wording is

The last two decimal digits of the year. If the result is a single digit it is prefixed by 0. The modified command %Oy produces the locale's alternative representation. The modified command %Ey produces the locale's alternative representation of offset from %EC (year only).

MSVC STL treats the regular modified form symmetrically. Just as %Ey is the offset from %EC, so %y is the offset from %C, which is itself "[t]he year divided by 100 using floored division." (emphasis added). Because -1976 is the 24th year of the -20th century, the above code will print "76 24" using MSVC STL. However, many users expect, and libc++ gives, a result based on the literal wording, "76 76".

IEEE 1003.1-2008 strftime expects the century to be nonnegative, but the glibc implementation prints 24 for -1976. My own opinion is that this is the better result, because it consistently interprets %C and %y as the quotient and remainder of floored division by 100.

Howard Hinnant, coauthor of the original [time.format] wording in P0355 adds:

On the motivation for this design it is important to remember a few things:

  • POSIX strftime/strptime doesn't handle negative years in this department, so this is an opportunity for an extension in functionality.

  • This is a formatting/parsing issue, as opposed to a computational issue. This means that human readability of the string syntax is the most important aspect. Computational simplicity takes a back seat (within reason).

  • %C can't be truncated division, otherwise the years [-99, -1] would map to the same century as the years [0, 99]. So floored division is a pretty easy and obvious solution.

  • %y is obvious for non-negative years: The last two decimal digits, or y % 100.

This leaves how to represent negative years with %y. I can think of 3 options:

  1. Use the last two digits without negating: -1976 → 76.

  2. Use the last two digits and negate it: -1976 → -76.

  3. Use floored modulus arithmetic: -1976 → 24.

The algorithm to convert %C and %y into a year is not important to the client because these are both strings, not integers. The client will do it with parse, not 100*C + y.

I discounted solution 3 as not sufficiently obvious. If the output for -1976 was 23, the human reader wouldn't immediately know that this is off by 1. The reader is expecting the POSIX spec:

the last two digits of the year as a decimal number [00,99].

24 just doesn't cut it.

That leaves solution 1 or 2. I discounted solution 2 because having the negative in 2 places (the %C and %y) seemed overly complicated and more error prone. The negative sign need only be in one place, and it has to be in %C to prevent ambiguity.

That leaves solution 1. I believe this is the solution for an extension of the POSIX spec to negative years with the property of least surprise to the client. The only surprise is in %C, not %y, and the surprise in %C seems unavoidable.

History
Date User Action Args
2022-11-30 10:08:16adminsetmessages: + msg13127
2022-11-19 14:28:44adminsetmessages: + msg13106
2022-11-18 00:00:00admincreate