On Wed, Mar 18, 2020 at 12:29 PM George Spelvin <lkml@xxxxxxx> wrote: > > On Wed, Mar 18, 2020 at 10:36:10AM -0700, Dan Williams wrote: > > On Wed, Mar 18, 2020 at 1:20 AM George Spelvin <lkml@xxxxxxx> wrote: > >> On Tue, Mar 17, 2020 at 08:53:55PM -0700, Dan Williams wrote: > >>> I had the impression that unless unlikely is "mostly never" then it > >>> can do more harm than good. Is a branch guaranteed to be taken every > >>> BITS_PER_LONG'th occurrence really a candidate for unlikely() > >>> annotation? > >> > >> I had to look this up. GCC manual: > >> > >> For the purposes of branch prediction optimizations, the probability > >> that a '__builtin_expect' expression is 'true' is controlled by GCC's > >> 'builtin-expect-probability' parameter, which defaults to 90%. You can > >> also use '__builtin_expect_with_probability' to explicitly assign a > >> probability value to individual expressions. > >> > >> So I think that <= 10% is good enough, which is true in this case. > >> > >> I was tring to encourage the compiler to: > >> * Place this code path out of line, and > >> * Not do the stack manipulations (build a frame, spill registers) > >> needed for a non-leaf function if this path isn't taken. > > > > Understood, I think it's ok in this case because the shuffling only > > happens for order-10 page free events by default so it will be > > difficult to measure the perf impact either way. But in other kernel > > contexts I think unlikely() annotation should come with numbers, 90% > > not taken is not sufficient in and of itself. > > I'm not sure I fully understand your point. I *think* you're > editorializing on unlikely() in general and not this specific code, > but it's a little hard to follow. Yes, editorializing on unlikely(). Specifically I would normally ask for perf numbers to show that the hint is worth it, but I talked myself out of asking for that in this case. > Your mention of "order-10 page free events" is confusing. Do you mean > "(order-10 page) free events", i.e. freeing of 1024 consecutive pages? > Or are you using "order" as a synonym for "approximately" and you mean > "approximately 10 (page free event)s"? I'm referring to this: if (is_shuffle_order(order)) add_to_free_area_random(page, &zone->free_area[order], Where shuffle order is MAX_ORDER-1. I.e. this code is only triggered when we might be releasing a 4MB buddy-page. > We both agree (I hope) that the number here is obvious on brief > inspection: 1/BITS_PER_LONG. > > GCC's heuristics are tuned to value cycles on the fast path 9x as much as > cycles on the slow path, so it will spend up to 9 cycles on the slow path > to save a cycle on the fast path. > > I've found one comment (https://pastebin.com/S8Y8tqZy) saying that > GCC < 9.x was a lot sloppier on the cost ratio and could pessimize > the code if the branch was more than ~ 1% taken. Perhaps that's what > you're remembering? Yes, thanks for that digging! > Fortunately, 1/64 = 1.56% is fairly close to 1%. so I'm not too > worried. Makes sense.