Re: [PATCH 1/5] MSVC: Windows-native implementation for subset of Pthreads API

Dmitry Potapov <dpotapov@xxxxxxxxx> · Thu, 14 Jan 2010 08:12:41 +0300

On Wed, Jan 13, 2010 at 07:40:43PM +0100, Johannes Sixt wrote:
> On Mittwoch, 13. Januar 2010, Dmitry Potapov wrote:
> > On Tue, Jan 12, 2010 at 10:13:38PM +0100, Johannes Sixt wrote:
> > > In particular, it doesn't say that it is atomic WRT reads such as we have
> > >
> > > here:
> > > > >> +     /* we're done waiting, so make sure we decrease waiters count
> > > > >> */ +     EnterCriticalSection(&cond->waiters_lock);
> > > > >> +     --cond->waiters;
> > > > >> +     LeaveCriticalSection(&cond->waiters_lock);
> >
> > and these lines should be replaced with
> >
> >   InterlockedDecrement(&cond->waiters)
> 
> Ah, yes, of course. I quoted the wrong section, sorry. By "atomic WRT reads" I 
> meant this snippet:
> 
> >> +     EnterCriticalSection(&cond->waiters_lock);
> >> +     have_waiters = cond->waiters > 0;
> >> +     LeaveCriticalSection(&cond->waiters_lock);
> 
> Is there "InterlockedRead()"? I suppose no, but I would get confirmation that 
> a simple memory mov instruction is atomic WRT Interlocked* functions.

If I were writing Interlocked API, I would certainly add InterlockedRead()
and InterlockedWrite() functions, but somehow Microsoft decided that these
functions are redundant. Instead, they provided the following comment:
"Simple reads and writes to properly-aligned 32-bit variables are atomic
operations."
http://msdn.microsoft.com/en-us/library/ms684122%28VS.85%29.aspx

If an operation is atomic, it means that no matter what else is happening
on the system, this operation will performed atomically WRT with any other.
So, yes, the 'mov' instruction is atomic WRT Interlocked functions, no
matter how Interlocked functions are implemented.

As to your concern about gcc doing something different. Let's take a look
how atomic_read is implemented in the Linux kernel:

arch/alpha/include/asm/atomic.h:
#define atomic_read(v)		((v)->counter + 0)

arch/arm/include/asm/atomic.h
#define atomic_read(v)	((v)->counter)

arch/avr32/include/asm/atomic.h
#define atomic_read(v)		((v)->counter)

arch/blackfin/include/asm/atomic.h
#define atomic_read(v)	__raw_uncached_fetch_asm(&(v)->counter)

arch/cris/include/asm/atomic.h
#define atomic_read(v) ((v)->counter)

arch/frv/include/asm/atomic.h
#define atomic_read(v)		((v)->counter)

arch/h8300/include/asm/atomic.h
#define atomic_read(v)		((v)->counter)

arch/ia64/include/asm/atomic.h
#define atomic_read(v)		((v)->counter)

arch/m32r/include/asm/atomic.h
#define atomic_read(v)	((v)->counter)

arch/m68k/include/asm/atomic_mm.h
#define atomic_read(v)		((v)->counter)

arch/m68k/include/asm/atomic_no.h
#define atomic_read(v)		((v)->counter)

arch/mips/include/asm/atomic.h
#define atomic_read(v)		((v)->counter)

arch/mn10300/include/asm/atomic.h
#define atomic_read(v)	((v)->counter)

arch/parisc/include/asm/atomic.h
static __inline__ int atomic_read(const atomic_t *v)
{
	return v->counter;
}

arch/powerpc/include/asm/atomic.h
static __inline__ int atomic_read(const atomic_t *v)
{
	int t;

	__asm__ __volatile__("lwz%U1%X1 %0,%1" : "=r"(t) : "m"(v->counter));

	return t;
}

arch/s390/include/asm/atomic.h
static inline int atomic_read(const atomic_t *v)
{
	barrier();
	return v->counter;
}

arch/sh/include/asm/atomic.h
#define atomic_read(v)		((v)->counter)

arch/sparc/include/asm/atomic_32.h
#define atomic_read(v)          ((v)->counter)

arch/sparc/include/asm/atomic_64.h
#define atomic_read(v)		((v)->counter)

arch/x86/include/asm/atomic_32.h
static inline int atomic_read(const atomic_t *v)
{
	return v->counter;
}

arch/x86/include/asm/atomic_64.h
static inline int atomic_read(const atomic_t *v)
{
	return v->counter;
}

arch/xtensa/include/asm/atomic.h
#define atomic_read(v)		((v)->counter)

===

You see, except PowerPC and s390, it is just 'mov' written in C.
Moreover, if you look at git log, you will see that using asm
for PowerPC is just a matter of optimization:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=9f0cbea0d8cc47801b853d3c61d0e17475b0cc89

As to s390, I have no idea why it uses barrier(), but I do not think
that Windows will ever run on this obsolete architecture, so I don't
even care to look at it closer...

In fact, ability to read and write some integer type is a requirement of
the C standard (at least, for C99):

Section 7.14, paragraph 2:
"The type defined is
       sig_atomic_t
which is the (possibly volatile-qualified) integer type of an object
that can be accessed as an atomic entity, even in the presence of
asynchronous interrupts."

Section 7.18.3, paragraph 3:
"If sig_atomic_t (see 7.14) is defined as a signed integer type, the
value of SIG_ATOMIC_MIN shall be no greater than −127 and the value of
SIG_ATOMIC_MAX shall be no less than 127; otherwise, sig_atomic_t is
defined as an unsigned integer type, and the value of SIG_ATOMIC_MIN
shall be 0 and the value of SIG_ATOMIC_MAX shall be no less than 255."

So, for any architecture where you can use C, it should exist some
integer type that can be read and written atomically (though, it can
be as small as one byte).

Finally, there is a paranoiac implementation of InterlockedRead(&foo):

   result = InterlockedAdd(&foo, 0)

but, IMHO, it is pathetic...

I hope I have explained well enough why I can vouch that the above
assignment works atomically WRT any Interlock function.

Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html