Re: paradox re: constraints for ReadWrite memory in asm instruction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Well, as sometimes happens, as soon as I hit the send/post button, the answer occurs to me. Well, part of it, anyway.

I'll break the problem down into 2 families. The first family is what I think of as the "Interlocked" family of functions. These all require a memory operand and use the "lock" prefix (remember, I'm on x64). For these functions, I believe the correct definition is of the form:

__CRT_INLINE BOOLEAN InterlockedBitTestAndSet(volatile LONG *Base, LONG Bit)
    {
      int old = 0;
      __asm__ __volatile__("lock ; btsl %2,%1\n\tsbbl %0,%0 "
    :"=r" (old)
    :"m" (*Base), "Ir" (Bit)
        :"memory", "cc");
      return (BOOLEAN) (old!=0);
    }

Note that Base is NOT listed as an output, and yes, I'm using the "memory" clobber. While there is a performance penalty for this clobber, I believe that for all instructions using the "lock" prefix, you *want* the "ReadWriteBarrier" this achieves to ensure proper coordination.

But I still feel a little uncomfortable not listing Base as an output parameter. Can someone confirm that by clobbering "memory" I'm covered?

So, that leaves the non-interlocked instructions. And well, I have a theory. However, it requires some serious "rules-lawyering" of the docs. Take that same routine, but remove the interlock:

    __CRT_INLINE BOOLEAN BitTestAndSet(LONG *Base, LONG Bit)
    {
      int old = 0;
      __asm__ ("btsl %3,%1\n\tsbbl %0,%0 "
    :"=r" (old), "+rm" (*Base)
    :"1" (*Base), "Ir" (Bit)
        :"cc");
      return (BOOLEAN) (old!=0);
    }

Now, at first blush, this might look like it violates what the docs say:

"You should only use read-write operands when the constraints for the operand [...] allow a register."

And clearly I am using memory. However, the docs *don't* say "ONLY allow a register." And my definition does "allow a register." Yes, it's a stretch, but technically I in compliance with the stated requirements, if not perhaps the spirit.

Unfortunately, it's possible that the people who actually write the compiler/optimizer may read this sentence differently. So I'm still hoping for an informed response about what needs to be done to meet the standards here.

Hopefully having some specifics will make this question easier to answer.

Thanks.

On 3/24/2013 5:16 PM, dw wrote:
I am trying to figure out the proper constraints for __asm__ instructions on x64 that both read and write memory (for example "bts" or "lock xadd"). I'm ok writing the assembler, I'm just struggling with writing the constraints. My problem is that the gcc docs forbid every permutation of constraint I can think of that would define this behavior. You have 3 mutually exclusive requirements:

1) Since the value being read+written is memory, the memory must be specified as both input and output. 2) The docs say: "You should only use read-write operands when the constraints for the operand [...] allow a register." 3) The docs say: "You may not write a clobber description in a way that overlaps with an input or output operand. [...] There is no way for you to specify that an input operand is modified without also specifying it as an output operand."

In summary: You have to specify something for output, but you can't have "=m" (#1) or "+m" (#2) for output, and you can't fake the output by using a clobber constraint against the address (#3).

Which effectively means there is *no* way to specify a memory constraint as read+write. Since I don't believe that is true, I can only assume the docs are wrong. The question is: which part?

I have seen a bunch of code that uses constraints that violate these documented requirements. I am sure that most of them work (at least most of the time). But I don't want to base my program on code that I know violates the docs. That's just asking for trouble.

I'm looking for the officially supported and documented way to specify memory constraints as read+write. And obviously I want what I write to produce the most performant code consistent with supported standards.

Thanks.







[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux