Date
2007-09-09.00:00:00
Message id
383

Content

At least a couple of places in the IS state that indirection through a null pointer produces undefined behavior: 6.9.1 [intro.execution] paragraph 4 gives "dereferencing the null pointer" as an example of undefined behavior, and 9.3.4.3 [dcl.ref] paragraph 4 (in a note) uses this supposedly undefined behavior as justification for the nonexistence of "null references."

However, 7.6.2.2 [expr.unary.op] paragraph 1, which describes the unary "*" operator, does not say that the behavior is undefined if the operand is a null pointer, as one might expect. Furthermore, at least one passage gives dereferencing a null pointer well-defined behavior: 7.6.1.8 [expr.typeid] paragraph 2 says

If the lvalue expression is obtained by applying the unary * operator to a pointer and the pointer is a null pointer value (7.3.12 [conv.ptr]), the typeid expression throws the bad_typeid exception (17.7.5 [bad.typeid]).

This is inconsistent and should be cleaned up.

Bill Gibbons:

At one point we agreed that dereferencing a null pointer was not undefined; only using the resulting value had undefined behavior.

For example:

    char *p = 0;
    char *q = &*p;

Similarly, dereferencing a pointer to the end of an array should be allowed as long as the value is not used:

    char a[10];
    char *b = &a[10];   // equivalent to "char *b = &*(a+10);"

Both cases come up often enough in real code that they should be allowed.

Mike Miller:

I can see the value in this, but it doesn't seem to be well reflected in the wording of the Standard. For instance, presumably *p above would have to be an lvalue in order to be the operand of "&", but the definition of "lvalue" in 7.2.1 [basic.lval] paragraph 2 says that "an lvalue refers to an object." What's the object in *p? If we were to allow this, we would need to augment the definition to include the result of dereferencing null and one-past-the-end-of-array.

Tom Plum:

Just to add one more recollection of the intent: I was very happy when (I thought) we decided that it was only the attempt to actually fetch a value that creates undefined behavior. The words which (I thought) were intended to clarify that are the first three sentences of the lvalue-to-rvalue conversion, 7.3.2 [conv.lval]:

An lvalue (7.2.1 [basic.lval]) of a non-function, non-array type T can be converted to an rvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the lvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.

In other words, it is only the act of "fetching", of lvalue-to-rvalue conversion, that triggers the ill-formed or undefined behavior. Simply forming the lvalue expression, and then for example taking its address, does not trigger either of those errors. I described this approach to WG14 and it may have been incorporated into C 1999.

Mike Miller:

If we admit the possibility of null lvalues, as Tom is suggesting here, that significantly undercuts the rationale for prohibiting "null references" -- what is a reference, after all, but a named lvalue? If it's okay to create a null lvalue, as long as I don't invoke the lvalue-to-rvalue conversion on it, why shouldn't I be able to capture that null lvalue as a reference, with the same restrictions on its use?

I am not arguing in favor of null references. I don't want them in the language. What I am saying is that we need to think carefully about adopting the permissive approach of saying that it's all right to create null lvalues, as long as you don't use them in certain ways. If we do that, it will be very natural for people to question why they can't pass such an lvalue to a function, as long as the function doesn't do anything that is not permitted on a null lvalue.

If we want to allow &*(p=0), maybe we should change the definition of "&" to handle dereferenced null specially, just as typeid has special handling, rather than changing the definition of lvalue to include dereferenced nulls, and similarly for the array_end+1 case. It's not as general, but I think it might cause us fewer problems in the long run.