Well, as sometimes happens, as soon as I hit the send/post button, the
answer occurs to me. Well, part of it, anyway.
I'll break the problem down into 2 families. The first family is what I
think of as the "Interlocked" family of functions. These all require a
memory operand and use the "lock" prefix (remember, I'm on x64). For
these functions, I believe the correct definition is of the form:
__CRT_INLINE BOOLEAN InterlockedBitTestAndSet(volatile LONG *Base,
LONG Bit)
{
int old = 0;
__asm__ __volatile__("lock ; btsl %2,%1\n\tsbbl %0,%0 "
:"=r" (old)
:"m" (*Base), "Ir" (Bit)
:"memory", "cc");
return (BOOLEAN) (old!=0);
}
Note that Base is NOT listed as an output, and yes, I'm using the
"memory" clobber. While there is a performance penalty for this
clobber, I believe that for all instructions using the "lock" prefix,
you *want* the "ReadWriteBarrier" this achieves to ensure proper
coordination.
But I still feel a little uncomfortable not listing Base as an output
parameter. Can someone confirm that by clobbering "memory" I'm covered?
So, that leaves the non-interlocked instructions. And well, I have a
theory. However, it requires some serious "rules-lawyering" of the
docs. Take that same routine, but remove the interlock:
__CRT_INLINE BOOLEAN BitTestAndSet(LONG *Base, LONG Bit)
{
int old = 0;
__asm__ ("btsl %3,%1\n\tsbbl %0,%0 "
:"=r" (old), "+rm" (*Base)
:"1" (*Base), "Ir" (Bit)
:"cc");
return (BOOLEAN) (old!=0);
}
Now, at first blush, this might look like it violates what the docs say:
"You should only use read-write operands when the constraints for the
operand [...] allow a register."
And clearly I am using memory. However, the docs *don't* say "ONLY
allow a register." And my definition does "allow a register." Yes,
it's a stretch, but technically I in compliance with the stated
requirements, if not perhaps the spirit.
Unfortunately, it's possible that the people who actually write the
compiler/optimizer may read this sentence differently. So I'm still
hoping for an informed response about what needs to be done to meet the
standards here.
Hopefully having some specifics will make this question easier to answer.
Thanks.
On 3/24/2013 5:16 PM, dw wrote:
I am trying to figure out the proper constraints for __asm__
instructions on x64 that both read and write memory (for example "bts"
or "lock xadd"). I'm ok writing the assembler, I'm just struggling
with writing the constraints. My problem is that the gcc docs forbid
every permutation of constraint I can think of that would define this
behavior. You have 3 mutually exclusive requirements:
1) Since the value being read+written is memory, the memory must be
specified as both input and output.
2) The docs say: "You should only use read-write operands when the
constraints for the operand [...] allow a register."
3) The docs say: "You may not write a clobber description in a way
that overlaps with an input or output operand. [...] There is no way
for you to specify that an input operand is modified without also
specifying it as an output operand."
In summary: You have to specify something for output, but you can't
have "=m" (#1) or "+m" (#2) for output, and you can't fake the output
by using a clobber constraint against the address (#3).
Which effectively means there is *no* way to specify a memory
constraint as read+write. Since I don't believe that is true, I can
only assume the docs are wrong. The question is: which part?
I have seen a bunch of code that uses constraints that violate these
documented requirements. I am sure that most of them work (at least
most of the time). But I don't want to base my program on code that I
know violates the docs. That's just asking for trouble.
I'm looking for the officially supported and documented way to specify
memory constraints as read+write. And obviously I want what I write
to produce the most performant code consistent with supported standards.
Thanks.