Re: [RFCv2 05/10] x86/mm: Provide untagged_addr() helper

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 12 2022 at 17:16, Thomas Gleixner wrote:
> On Thu, May 12 2022 at 16:23, Peter Zijlstra wrote:
>> On Thu, May 12, 2022 at 03:06:38PM +0200, Thomas Gleixner wrote:
>>
>>> #define untagged_addr(addr)	({			\
>>> 	u64 __addr = (__force u64)(addr);		\
>>> 							\
>>> 	__addr &= current->thread.lam_untag_mask;	\
>>> 	(__force __typeof__(addr))__addr;		\
>>> })
>>> 
>>> No conditionals, fast _and_ correct. Setting this untag mask up once
>>> when LAM is enabled is not rocket science.
>>
>> But that goes wrong if someone ever wants to untag a kernel address and
>> not use the result for access_ok().
>>
>> I'd feel better about something like:
>>
>> 	s64 __addr = (addr);
>> 	s64 __sign = __addr;
>>
>> 	__sign >>= 63;
>> 	__sign &= lam_untag_mask;
>
> that needs to be
>
>  	__sign &= ~lam_untag_mask;
>
>> 	__addr &= lam_untag_mask;
>> 	__addr |= __sign;
>>
>> 	__addr;
>>
>> Which simply extends bit 63 downwards -- although possibly there's an
>> easier way to do that, this is pretty gross.
>
> For the price of a conditional:
>
>     __addr &= lam_untag_mask;
>     if (__addr & BIT(63))
>         __addr |= ~lam_untag_mask;
>
> Now you have the choice between gross and ugly.

Though we can also replace your flavour of gross with a different
flavour of gross:

	s64 sign = (s64)(addr) >> 63;

	addr ^= sign;
	addr &= mask;
	addr ^= sign;

After twisting my brain around replacing gross by something differently
gross and coming up with the gem above I actually did compile the
variants and discovered that GCC compiles your flavour of gross exactly
to this:

        mov    %rdi,%rax
        sar    $0x3f,%rax
        xor    %rax,%rdi
        and    %rsi,%rdi
        xor    %rdi,%rax

I have to admit that compilers are sometimes pretty smart. I might have
to rethink my prejudice. :)

But then clang converts your flavour of 'gross' to:

     	mov    %rsi,%rax
     	mov    %rsi,%rcx
     	and    %rdi,%rax
     	sar    $0x3f,%rdi
     	not    %rcx
     	and    %rdi,%rcx
     	or     %rcx,%rax

and my explicit flavour to:

      	mov    %rdi,%rax
      	mov    %rdi,%rcx
      	sar    $0x3f,%rcx
      	xor    %rcx,%rax
      	and    %rsi,%rax
      	xor    %rcx,%rax

which is at least slightly less retarted, but still has a pointless mov
there. Note, that this was compiled in user space with noinline
functions. I did some inlined variants as well and clang still insists
on using an extra register for no obvious reason. This might be more
efficient in reality, but I haven't bothered to write a test which
might give an answer via perf.

The ugly with the conditional resolves for both compilers to:

       	mov    %rsi,%rax
       	mov    %rsi,%rcx
       	not    %rcx
       	or     %rdi,%rcx
       	and    %rdi,%rax
       	test   %rdi,%rdi
       	cmovs  %rcx,%rax

At least they agree on that one.

But whatever we chose, it's sad, that we need to have support for
interfaces which swallow any pointer (user or kernel) because otherwise
this really boils down to a single OR resp. AND operation plus the
according mov to retrieve the mask.

Thanks,

        tglx




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux