On 04/23/2012 05:03 PM, Dalleau, Frederic wrote: > Hi David, > > On Fri, Apr 20, 2012 at 4:44 PM, David Henningsson > <david.henningsson at canonical.com> wrote: >> While researching a bug I came across something that might be a bug in the >> pa_once logic, but this stuff is tricky, so I might also be missing >> something. >> >> Imagine this: >> >> * Thread 1 runs pa_once_begin, succeeds and starts running the payload (i e >> the code that should only run once). >> * Thread 2 starts running pa_once_begin, but only the first row. We're now >> right *before* pa_atomic_inc(&control->ref) but *after* >> pa_atomic_load(&control->done). >> * Thread 1 finishes the payload, runs pa_once_done which sets control->done >> and frees the mutex. >> * Thread 2 continues, pa_once_begin succeeds and the payload is now run a >> second time! >> > > After reading your mail, I made some experiments by adding a usleep() call > in Thread 1 between pa_atomic_load(&control->done) and > pa_atomic_inc(&control->ref) > and that failed once-test 100% of time. > > I reverted the usleep and made another experiment using 50000 iterations in > once-test and it just failed. > > Good catch ! I tried to look up implementation/algorithm suggestions, and for the ones I found [1], there was no freeing of the mutex. Without freeing, the code becomes simpler. The attached patch is a version of that. I've just tried a simple once-test (but do feel free to run it 50000 times :-) ). But of course, now we leak a mutex. But that's what we already do with the static mutexes we use in a few places already, so maybe it doesn't matter much? -- David Henningsson, Canonical Ltd. https://launchpad.net/~diwic [1] e g http://en.wikipedia.org/wiki/Double-checked_locking