Re: [PATCH v2] mm/shuffle.c: Fix races in add_to_free_area_random()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 18, 2020 at 12:29 PM George Spelvin <lkml@xxxxxxx> wrote:
>
> On Wed, Mar 18, 2020 at 10:36:10AM -0700, Dan Williams wrote:
> > On Wed, Mar 18, 2020 at 1:20 AM George Spelvin <lkml@xxxxxxx> wrote:
> >> On Tue, Mar 17, 2020 at 08:53:55PM -0700, Dan Williams wrote:
> >>> I had the impression that unless unlikely is "mostly never" then it
> >>> can do more harm than good. Is a branch guaranteed to be taken every
> >>> BITS_PER_LONG'th occurrence really a candidate for unlikely()
> >>> annotation?
> >>
> >> I had to look this up.  GCC manual:
> >>
> >> For the purposes of branch prediction optimizations, the probability
> >> that a '__builtin_expect' expression is 'true' is controlled by GCC's
> >> 'builtin-expect-probability' parameter, which defaults to 90%.  You can
> >> also use '__builtin_expect_with_probability' to explicitly assign a
> >> probability value to individual expressions.
> >>
> >> So I think that <= 10% is good enough, which is true in this case.
> >>
> >> I was tring to encourage the compiler to:
> >> * Place this code path out of line, and
> >> * Not do the stack manipulations (build a frame, spill registers)
> >>   needed for a non-leaf function if this path isn't taken.
> >
> > Understood, I think it's ok in this case because the shuffling only
> > happens for order-10 page free events by default so it will be
> > difficult to measure the perf impact either way.  But in other kernel
> > contexts I think unlikely() annotation should come with numbers, 90%
> > not taken is not sufficient in and of itself.
>
> I'm not sure I fully understand your point.  I *think* you're
> editorializing on unlikely() in general and not this specific code,
> but it's a little hard to follow.

Yes, editorializing on unlikely(). Specifically I would normally ask
for perf numbers to show that the hint is worth it, but I talked
myself out of asking for that in this case.

> Your mention of "order-10 page free events" is confusing.  Do you mean
> "(order-10 page) free events", i.e. freeing of 1024 consecutive pages?
> Or are you using "order" as a synonym for "approximately" and you mean
> "approximately 10 (page free event)s"?

I'm referring to this:

      if (is_shuffle_order(order))
                add_to_free_area_random(page, &zone->free_area[order],

Where shuffle order is MAX_ORDER-1. I.e. this code is only triggered
when we might be releasing a 4MB buddy-page.

> We both agree (I hope) that the number here is obvious on brief
> inspection: 1/BITS_PER_LONG.
>
> GCC's heuristics are tuned to value cycles on the fast path 9x as much as
> cycles on the slow path, so it will spend up to 9 cycles on the slow path
> to save a cycle on the fast path.
>
> I've found one comment (https://pastebin.com/S8Y8tqZy) saying that
> GCC < 9.x was a lot sloppier on the cost ratio and could pessimize
> the code if the branch was more than ~ 1% taken.  Perhaps that's what
> you're remembering?

Yes, thanks for that digging!

> Fortunately, 1/64 = 1.56% is fairly close to 1%. so I'm not too
> worried.

Makes sense.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux