Re: [PATCH v3 2/2] locking/rwsem: Optimize down_read_trylock()
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
- To: Waiman Long <longman@xxxxxxxxxx>
- Subject: Re: [PATCH v3 2/2] locking/rwsem: Optimize down_read_trylock()
- From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 14 Feb 2019 10:09:44 -0800
- Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Will Deacon <will.deacon@xxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Linux List Kernel Mailing <linux-kernel@xxxxxxxxxxxxxxx>, "linux-alpha@xxxxxxxxxxxxxxx" <linux-alpha@xxxxxxxxxxxxxxx>, "linux-alpha@xxxxxxxxxxxxxxx" <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx>, linux-hexagon@xxxxxxxxxxxxxxx, linux-ia64@xxxxxxxxxxxxxxx, linuxppc-dev@xxxxxxxxxxxxxxxx, Linux-sh list <linux-sh@xxxxxxxxxxxxxxx>, sparclinux@xxxxxxxxxxxxxxx, linux-xtensa@xxxxxxxxxxxxxxxx, linux-arch <linux-arch@xxxxxxxxxxxxxxx>, "the arch/x86 maintainers" <x86@xxxxxxxxxx>, Arnd Bergmann <arnd@xxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, Davidlohr Bueso <dave@xxxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
- In-reply-to: <CAHk-=whKmuranma_HUOKBbDJHGmdWZr9MYW-+cmGzsOiJ2N1Sg@mail.gmail.com>
- References: <1550089932-6888-1-git-send-email-longman@redhat.com> <1550089932-6888-3-git-send-email-longman@redhat.com> <20190214103333.GH32494@hirez.programming.kicks-ass.net> <9e01d4ef-56df-7af8-a0f5-b49644e298bf@redhat.com> <CAHk-=whKmuranma_HUOKBbDJHGmdWZr9MYW-+cmGzsOiJ2N1Sg@mail.gmail.com>
On Thu, Feb 14, 2019 at 9:51 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> The arm64 numbers scaled horribly even before, and that's because
> there is too much ping-pong, and it's probably because there is no
> "stickiness" to the cacheline to the core, and thus adding the extra
> loop can make the ping-pong issue even worse because now there is more
> of it.
Actually, if it's using the ll/sc, then I don't see why arm64 should
even change. It doesn't really even change the pattern: the initial
load of the value is just replaced with a "ll" that gets a non-zero
value, and then we re-try without even doing the "sc" part.
End result: exact same "load once, then do ll/sc to update". Just
using a slightly different instruction pattern.
But maybe "ll" does something different to the cacheline than a regular "ld"?
Alternatively, the machine you used is using LSE, and the "swp" thing
has some horrid behavior when it fails.
So I take it back. I'm actually surprised that arm64 performs worse. I
don't think it should. But numbers walk, bullshit talks, and it
clearly does make for worse numbers on arm64.
Linus
[Index of Archives]
[Linux Kernel]
[Sparc Linux]
[DCCP]
[Linux ARM]
[Yosemite News]
[Linux SCSI]
[Linux x86_64]
[Linux for Ham Radio]