Created on 2013-06-17.00:00:00 last changed 116 months ago
[ 2015-02 Cologne ]
Handed over to SG1.
I believe that the following variation on IRIW should admit executions in which c1 = d1 = 5 and c2 = d2 = 0. If this is allowed, then what is sequence of program evaluations for [atomics.order] p9 that justifies the store to z? It seems that [atomics.order] p9 should not allow this execution because one of the stores to x or y has to appear earlier in the sequence, each of the fetch_adds reads the previous load in the thread (and thus must appear later in the sequence), and [atomics.order] p9 states that each load must read from the last prior assignment in the sequence.
atomic_int x; atomic_int y; atomic_int z; int c1, c2, d1, d2; static void a(void* obj) { atomic_store_explicit(&x, 5, memory_order_relaxed); } static void b(void* obj) { atomic_store_explicit(&y, 5, memory_order_relaxed); } static void c(void* obj) { c1 = atomic_load_explicit(&x, memory_order_relaxed); // this could also be an atomic load if the address depends on c1: c2 = atomic_fetch_add_explicit(&y, c1, memory_order_relaxed); } static void d(void* obj) { d1 = atomic_load_explicit(&y, memory_order_relaxed); d2 = atomic_fetch_add_explicit(&x, d1, memory_order_relaxed); } int user_main(int argc, char** argv) { thrd_t t1, t2, t3, t4; atomic_init(&x, 0); atomic_init(&y, 0); printf("Main thread: creating 4 threads\n"); thrd_create(&t1, (thrd_start_t)&a, NULL); thrd_create(&t2, (thrd_start_t)&b, NULL); thrd_create(&t3, (thrd_start_t)&c, NULL); thrd_create(&t4, (thrd_start_t)&d, NULL); thrd_join(t1); thrd_join(t2); thrd_join(t3); thrd_join(t4); printf("c1=%d c2=%d\n",c1,c2); printf("d1=%d d2=%d\n",d1,d2); // Can this store write 1000 (i.e., c1=d1=5, c2=d2=0)? atomic_store(&z, (c1+d1)*100+c2+d2); printf("Main thread is finished\n"); return 0; }
It seems that the easiest fix is to allow a load in [atomics.order] p9 to read from any prior store in the evaluation order.
That said, I would personally advocate the following: It seems to me that C/C++ atomics are in a bit of different situation than Java because:People are expected to use relaxed C++ atomics in potentially racy situations, so it isn't clear that semantics as complicated as the JMM's causality would be sane.
People who use C/C++ atomics are likely to be experts and use them in a very controlled fashion. I would be really surprised if compilers would find any real wins by optimizing the use of atomics.
Why not do something like:
There is satisfaction DAG of all program evaluations. Each evaluation observes the values of variables as computed by some prior assignment in the DAG. There is an edge x->y between two evaluations x and y if:the evaluation y observes a value computed by the evaluation x or
the evaluation y is an atomic store, the evaluation x is an atomic load, and there is a condition branch c that may depend (intrathread dependence) on x and x-sb->c and c-sb->y.
This seems to allow reordering of relaxed atomics that processors do without extra fence instructions, allows most reorderings by the compiler, and gets rid of satisfaction cycles.
History | |||
---|---|---|---|
Date | User | Action | Args |
2015-05-08 04:23:26 | admin | set | messages: + msg7396 |
2015-04-04 16:45:17 | admin | set | messages: + msg7330 |
2015-04-04 16:45:17 | admin | set | status: new -> open |
2013-06-17 00:00:00 | admin | create |