Alex Kogan's on July 17, 2019 12:45 am: > >> On Jul 16, 2019, at 7:47 AM, Nicholas Piggin <npiggin@xxxxxxxxx> wrote: >> >> Alex Kogan's on July 16, 2019 5:25 am: >>> Our evaluation shows that CNA also improves performance of user >>> applications that have hot pthread mutexes. Those mutexes are >>> blocking, and waiting threads park and unpark via the futex >>> mechanism in the kernel. Given that kernel futex chains, which >>> are hashed by the mutex address, are each protected by a >>> chain-specific spin lock, the contention on a user-mode mutex >>> translates into contention on a kernel level spinlock. >> >> What applications are those, what performance numbers? Arguably that's >> much more interesting than microbenchmarks (which are mainly useful to >> help ensure the fast paths are not impacted IMO). > > Those are applications that use locks in which waiting threads can park (block), > e.g., pthread mutexes. Under (user-level) contention, the park-unpark mechanism > in the kernel creates contention on (kernel) spin locks protecting futex chains. > As an example, we experimented with LevelDB (key-value store), and included > performance numbers in the patch. Or you are looking for something else? Oh, no that's good. I confused myself thinking that was another will it scale benchmark. The speedup becomes significant on readrandom, I wonder if of it might be that you're gating which threads get to complete the futex operation and so the effect is amplified beyond just the critical section of the spin lock? Am I reading the table correctly, this test gets about 2.1x speedup when scaling from 1 to 142 threads in the patch-CNA case? Thanks, Nick