Re: [PATCH 1/5] MSVC: Windows-native implementation for subset of Pthreads API

Dmitry Potapov <dpotapov@xxxxxxxxx> · Wed, 13 Jan 2010 15:53:12 +0300

On Tue, Jan 12, 2010 at 10:13:38PM +0100, Johannes Sixt wrote:
> On Freitag, 8. Januar 2010, Erik Faye-Lund wrote:
> > On Fri, Jan 8, 2010 at 4:32 AM, Dmitry Potapov <dpotapov@xxxxxxxxx> wrote:
> > > AFAIK, Win32 API assumes that reading LONG is always atomic, so
> > > the critical section is not really necesary here, but you need
> > > to declare 'waiters' as 'volatile':
> >
> > "Simple reads and writes to properly-aligned 32-bit variables are
> > atomic operations."
> > http://msdn.microsoft.com/en-us/library/ms684122(VS.85).aspx
> 
> But then the next sentence is:
> 
> "However, access is not guaranteed to be synchronized. If two threads are 
> reading and writing from the same variable, you cannot determine if one 
> thread will perform its read operation before the other performs its write 
> operation."
> 
> This goes without saying, IOW, those Microsofties don't know what they write, 
> which makes the documentation a bit less trustworthy.

The fact that Microsoft documentation is not written by brightest people
in the world is well known...

> 
> Nevertheless, I rewrote the code to use Interlocked* functions, and then read 
> the documentation again. InterlockedIncrement reads, for example:
> 
> "... This function is atomic with respect to calls to other interlocked 
> functions."

I have no clue what the author meant here. Perhaps Microsoft wanted to
reserve the right to implement Interlocked functions using an internal
lock on those architectures that do not have atomic operations. (For
instance, ARMv5 does not have atomic operations).

But any sane implementation of a critical section primitive requires
some operation that is atomic with respect to the user space (or you
kill the performance by calling some syscall in noncontentious case).
For instance, the Linux kernel provides this possibility by providing
__kernel_cmpxchg for ARM, which can be used to implement all other
synchronization primitives such mutexes and conditions. (Or on some
small MMU-less embedded system, disabling interrupts or the scheduler
lock is used). So, any sane implementation should atomic not only in
respect to other Interlock functions but also other synchronization
primitives.

In any case, on x86, it is implemented as _InterlockedIncrement, which
is a built-in function that generates the appropriate assembler instruction.

> 
> In particular, it doesn't say that it is atomic WRT reads such as we have 
> here:
> 
> > >> +     /* we're done waiting, so make sure we decrease waiters count */
> > >> +     EnterCriticalSection(&cond->waiters_lock);
> > >> +     --cond->waiters;
> > >> +     LeaveCriticalSection(&cond->waiters_lock);

and these lines should be replaced with

  InterlockedDecrement(&cond->waiters)

so it will be safe even on utterly idiotic implementation of Interlocked
functions that uses some internal lock; and as I said earlier on x86,
Interlocked functions are translated in appropriate assembler instructions.

Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html