Matching of null characters by regular expressions is underspecified
Jonathan Wakely

Created on 2021-09-27.00:00:00 last changed 13 months ago


Date: 2021-10-15.00:00:00

[ 2021-10-14; Reflector poll ]

Set priority to 3 after reflector poll.

Date: 2021-10-02.16:57:00

ECMAScript says that \0 is an ordinary character and can be matched. POSIX says the opposite:

"The interfaces specified in POSIX.1-2017 do not permit the inclusion of a NUL character in an RE or in the string to be matched. If during the operation of a standard utility a NUL is included in the text designated to be matched, that NUL may designate the end of the text string for the purposes of matching."

So does that mean std::regex{"", 1, regex::basic} should throw an exception?

And std::regex_match(string{"a\0b", 3}, regex{"a.b", regex::basic}) should fail?

The POSIX rule is because those interfaces are specified with NTBS arguments, so there's no way to distinguish "a\0b" and "a". The C++ interfaces could allow it, but we never specify any divergence from POSIX, so presumably the rule still applies. Is that what was intended and is it what we want?

Date User Action Args
2021-10-14 11:35:36adminsetmessages: + msg12160
2021-09-27 00:00:00admincreate