Re: [PATCH 2/4] rcu/tasks: Handle new PF_IDLE semantics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Tue, 31 Oct 2023, Peter Zijlstra wrote:

(I can't say anything about the WRITE_ONCE/rcu code, just about the below 
codegen part)

> Welcome is not the right word. What bugs me most is that this was never
> raised when this code was written :/
> 
> Mostly my problem is that GCC generates such utter shite when you
> mention volatile. See, the below patch changes the perfectly fine and
> non-broken:
> 
> 0148  1d8:      49 83 06 01             addq   $0x1,(%r14)

What is non-broken here that is ...

> into:
> 
> 0148  1d8:	49 8b 06             	mov    (%r14),%rax
> 014b  1db:	48 83 c0 01          	add    $0x1,%rax
> 014f  1df:	49 89 06             	mov    %rax,(%r14)

... broken here?  (Sure code size and additional register use, but I don't 
think you mean this with broken).

> For absolutely no reason :-(

The reason is simple (and should be obvious): to adhere to the abstract 
machine regarding volatile.  When x is volatile then x++ consists of a 
read and a write, in this order.  The easiest way to ensure this is to 
actually generate a read and a write instruction.  Anything else is an 
optimization, and for each such optimization you need to actively find an 
argument why this optimization is correct to start with (and then if it's 
an optimization at all).  In this case the argument needs to somehow 
involve arguing that an rmw instruction on x86 is in fact completely 
equivalent to the separate instructions, from read cycle to write cycle 
over all pipeline stages, on all implementations of x86.  I.e. that a rmw 
instruction is spec'ed to be equivalent.

You most probably can make that argument in this specific case, I'll give 
you that.  But why bother to start with, in a piece of software that is 
already fairly complex (the compiler)?  It's much easier to just not do 
much anything with volatile accesses at all and be guaranteed correct.
Even more so as the software author, when using volatile, most likely is 
much more interested in correct code (even from a abstract machine 
perspective) than micro optimizations.

> At least clang doesn't do this, it stays:
> 
> 0403  413:      49 ff 45 00             incq   0x0(%r13)
> 
> irrespective of the volatile.

And, are you 100% sure that this is correct?  Even for x86 CPU 
pipeline implementations that you aren't intimately knowing about? ;-)

But all that seems to be a side-track anyway, what's your real worry with  
the code sequence generated by GCC?


Ciao,
Michael.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux