Title
regex named character classes and case-insensitivity don't mix
Status
cd1
Section
[re]
Submitter
Eric Niebler

Created on 2005-07-01.00:00:00 last changed 164 months ago

Messages

Date: 2010-10-21.18:28:33

Proposed resolution:

Adopt the proposed resolution in N2409.

Date: 2005-07-01.00:00:00

This defect is also being discussed on the Boost developers list. The full discussion can be found here: http://lists.boost.org/boost/2005/07/29546.php

-- Begin original message --

Also, I may have found another issue, closely related to the one under discussion. It regards case-insensitive matching of named character classes. The regex_traits<> provides two functions for working with named char classes: lookup_classname and isctype. To match a char class such as [[:alpha:]], you pass "alpha" to lookup_classname and get a bitmask. Later, you pass a char and the bitmask to isctype and get a bool yes/no answer.

But how does case-insensitivity work in this scenario? Suppose we're doing a case-insensitive match on [[:lower:]]. It should behave as if it were [[:lower:][:upper:]], right? But there doesn't seem to be enough smarts in the regex_traits interface to do this.

Imagine I write a traits class which recognizes [[:fubar:]], and the "fubar" char class happens to be case-sensitive. How is the regex engine to know that? And how should it do a case-insensitive match of a character against the [[:fubar:]] char class? John, can you confirm this is a legitimate problem?

I see two options:

1) Add a bool icase parameter to lookup_classname. Then, lookup_classname( "upper", true ) will know to return lower|upper instead of just upper.

2) Add a isctype_nocase function

I prefer (1) because the extra computation happens at the time the pattern is compiled rather than when it is executed.

-- End original message --

For what it's worth, John has also expressed his preference for option (1) above.

History
Date User Action Args
2010-10-21 18:28:33adminsetmessages: + msg2940
2005-07-01 00:00:00admincreate