Issue 3343: Ordering of calls to unlock() and notify_all() in Effects element of notify_all_at_thread_exit() should be reversed

Title: Ordering of calls to unlock() and notify_all() in Effects element of notify_all_at_thread_exit() should be reversed
Status: open
Section: [thread.condition.nonmember]
Submitter: Lewis Baker

Created on 2019-11-21.00:00:00 last changed 25 months ago

Messages

msg13625 (view)

Date: 2023-06-13.14:35:33

Proposed resolution:

This wording is relative to n4950.

Change [thread.condition.nonmember] as indicated:
```
void notify_all_at_thread_exit(condition_variable& cond, unique_lock<mutex> lk);
```
[…]
-2- Effects: Transfers ownership of the lock associated with lk into internal storage and schedules cond to be notified when the current thread exits,. This notification is sequenced after all objects of thread storage duration associated with the current thread have been destroyed~~. This notification~~ and is equivalent to:
```
lk.unlock();
cond.notify_all();
lk.unlock();
```

msg11147 (view)

Date: 2023-06-15.00:00:00

[ 2023-06-13, Varna; Tim provides improved wording ]

Addressed mailing list comments. Ask SG1 to check.

msg10891 (view)

Date: 2023-06-13.14:35:33

[ 2020-02, Prague ]

Response from SG1: "We discussed it in Prague. We agree it’s an error and SG1 agreed with the PR."

This wording is relative to N4842.

Change [thread.condition.nonmember] as indicated:
```
void notify_all_at_thread_exit(condition_variable& cond, unique_lock<mutex> lk);
```
[…]
-2- Effects: Transfers ownership of the lock associated with lk into internal storage and schedules cond to be notified when the current thread exits, after all objects of thread storage duration associated with the current thread have been destroyed. This notification is equivalent to:
```
lk.unlock();
cond.notify_all();
lk.unlock();
```

msg10875 (view)

Date: 2019-12-15.00:00:00

[ 2019-12-15; Daniel synchronizes wording with N4842 ]

msg10845 (view)

Date: 2019-12-08.00:00:00

[ 2019-12-08 Issue Prioritization ]

Priority to 3 after reflector discussion.

msg10844 (view)

Date: 2019-11-21.00:00:00

[thread.condition.nonmember] p2 states:

Effects: Transfers ownership of the lock associated with lk into internal storage and schedules cond to be notified when the current thread exits, after all objects of thread storage duration associated with the current thread have been destroyed. This notification shall be as if:
lk.unlock();
cond.notify_all();

One common use-cases for the notify_all_at_thread_exit() is in conjunction with thread::detach() to allow detached threads to signal when they complete and to allow another thread to wait for them to complete using the condition_variable/mutex pair.

However, the current wording for notify_all_at_thread_exit(condition_variable& cond, unique_lock<mutex> lk) makes it impossible to know when it is safe to destroy the condition_variable in the presence of spurious wake-ups and detached threads.

For example: Consider the following code-snippet:

#include <condition_variable>
#include <mutex>
#include <thread>

int main() {
  std::condition_variable cv;
  std::mutex mut;
  bool complete = false;

  std::thread{[&] {
    // do work here

    // Signal thread completion
    std::unique_lock lk{mut};
    complete = true;
    std::notify_all_at_thread_exit(cv, std::move(lk));
  }}.detach();

  // Wait until thread completes
  std::unique_lock lk{mut};
  cv.wait(lk, [&] { return complete; });

  // condition_variable destroyed on scope exit
  return 0;
}

This seems to an intended usage of thread::detach() and std::notify_all_at_thread_exit() and yet this code contains a race involving the call to cv.notify_all() on the created thread, and the destructor of the condition_variable.

To highlight the issue, consider the following case:

Let T0 be the thread that executes main() and T1 be the thread created by the std::thread construction.

T0: creates thread T1
T0: context-switched out by OS
T1: starts running

T1: acquires mutex lock
T1: sets complete = true

T1: calls notify_all_at_thread_exit()
T1: returns from thread-main function and runs all thread-local destructors
T1: calls lk.unlock()
T1: context-switched out by OS
T0: resumes execution
T0: acquires mutex lock
T0: calls cv.wait() which returns immediately as complete is true

T0: returns from main(), destroying condition_variable
T1: resumes execution

T1: calls cv.notify_all() on a dangling cv reference (undefined behaviour)

Other sequencings are possible involving spurious wake-ups of the cv.wait() call.

A proof-of-concept showing this issue can be found here.

The current wording requires releasing the mutex lock before calling cv.notify_all(). In the presence of spurious wake-ups of a condition_variable::wait(), there is no way to know whether or not a detached thread that called std::notify_all_at_thread_exit() has finished calling cv.notify_all(). This means there is no portable way to know when it will be safe for the waiting thread to destroy that condition_variable.

However, if we were to reverse the order of the calls to lk.unlock() and cond.notify_all() then the thread waiting for the detached thread to exit would not be able to observe the completion of the thread (in the above case, this would be observing the assignment of true to the complete variable) until the mutex lock was released by that thread and subsequently acquired by the waiting thread which would only happen after the completion of the call to cv.notify_all(). This would allow the above code example to eliminate the race between a subsequent destruction of the condition-variable and the call to cv.notify_all().

History
Date	User	Action	Args
2023-06-13 14:35:33	admin	set	messages: + msg13625
2023-06-13 14:35:33	admin	set	status: new -> open
2020-02-24 16:10:04	admin	set	messages: + msg11147
2019-12-15 14:27:56	admin	set	messages: + msg10891
2019-12-08 18:50:22	admin	set	messages: + msg10875
2019-11-23 17:29:46	admin	set	messages: + msg10845
2019-11-21 00:00:00	admin	create