Re: [PATCH v2] mm/shuffle.c: Fix races in add_to_free_area_random()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 18, 2020 at 10:36:10AM -0700, Dan Williams wrote:
> On Wed, Mar 18, 2020 at 1:20 AM George Spelvin <lkml@xxxxxxx> wrote:
>> On Tue, Mar 17, 2020 at 08:53:55PM -0700, Dan Williams wrote:
>>> I had the impression that unless unlikely is "mostly never" then it
>>> can do more harm than good. Is a branch guaranteed to be taken every
>>> BITS_PER_LONG'th occurrence really a candidate for unlikely()
>>> annotation?
>>
>> I had to look this up.  GCC manual:
>>
>> For the purposes of branch prediction optimizations, the probability
>> that a '__builtin_expect' expression is 'true' is controlled by GCC's
>> 'builtin-expect-probability' parameter, which defaults to 90%.  You can
>> also use '__builtin_expect_with_probability' to explicitly assign a
>> probability value to individual expressions.
>>
>> So I think that <= 10% is good enough, which is true in this case.
>>
>> I was tring to encourage the compiler to:
>> * Place this code path out of line, and
>> * Not do the stack manipulations (build a frame, spill registers)
>>   needed for a non-leaf function if this path isn't taken.
> 
> Understood, I think it's ok in this case because the shuffling only
> happens for order-10 page free events by default so it will be
> difficult to measure the perf impact either way.  But in other kernel
> contexts I think unlikely() annotation should come with numbers, 90%
> not taken is not sufficient in and of itself.

I'm not sure I fully understand your point.  I *think* you're
editorializing on unlikely() in general and not this specific code,
but it's a little hard to follow.

Your mention of "order-10 page free events" is confusing.  Do you mean 
"(order-10 page) free events", i.e. freeing of 1024 consecutive pages?  
Or are you using "order" as a synonym for "approximately" and you mean 
"approximately 10 (page free event)s"?

We both agree (I hope) that the number here is obvious on brief 
inspection: 1/BITS_PER_LONG.  

GCC's heuristics are tuned to value cycles on the fast path 9x as much as 
cycles on the slow path, so it will spend up to 9 cycles on the slow path 
to save a cycle on the fast path.

I've found one comment (https://pastebin.com/S8Y8tqZy) saying that
GCC < 9.x was a lot sloppier on the cost ratio and could pessimize
the code if the branch was more than ~ 1% taken.  Perhaps that's what 
you're remembering?

Fortunately, 1/64 = 1.56% is fairly close to 1%. so I'm not too
worried.

> You can add:
> 
> Acked-by: Dan Williams <dan.j.williams@xxxxxxxxx>

Thank you!




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux