Title
Handling of multi-character collating elements by the regex FSM is underspecified
Status
new
Section
[re.grammar]
Submitter
Hubert Tong

Created on 2017-06-25.00:00:00 last changed 82 months ago

Messages

Date: 2017-07-12.01:35:02

[ 2017-07 Toronto Monday issue prioritization ]

Priority 4

Date: 2017-06-25.00:00:00

In N4660 subclause 31.13 [re.grammar] paragraph 5:

The productions ClassAtomExClass, ClassAtomCollatingElement and ClassAtomEquivalence provide functionality equivalent to that of the same features in regular expressions in POSIX.

The broadness of the above statement makes it sound like it is merely a statement of intent; however, this appears to be a necessary normative statement insofar as identifying the general semantics to be associated with the syntactic forms identified. In any case, if it is meant for ClassAtomCollatingElement to provide functionality equivalent to a collating symbol in a POSIX bracket expression, multi-character collating elements need to be considered.

In [re.grammar] paragraph 14:

The behavior of the internal finite state machine representation when used to match a sequence of characters is as described in ECMA-262. The behavior is modified according to any match_flag_type flags specified when using the regular expression object in one of the regular expression algorithms. The behavior is also localized by interaction with the traits class template parameter as follows: [bullets 14.1 to 14.4]

In none of the bullets does the wording handle multi-character collating elements in a clear manner:

  • 14.1 deals in characters.

  • 14.2 deals in characters (traits_inst.translate accepts only a single character).

  • 14.3 might handle a multi-character collating element; however, there is no specification of how such a collating element is to be identified from the sequence of characters. Additionally, the definition of primary equivalence class specifies that it is a set of characters (not of collating elements).

  • 14.4 deals in characters.

The ECMA-262 specification for ClassRanges also deals in characters.

History
Date User Action Args
2017-07-12 01:35:02adminsetmessages: + msg9349
2017-06-25 00:00:00admincreate