On Tue, 19 Dec 2017 11:50:10 +0300 Yury Norov <ynorov@xxxxxxxxxxxxxxxxxx> wrote: > This benchmark sends many IPIs in different modes and measures > time for IPI delivery (first column), and total time, ie including > time to acknowledge the receive by sender (second column). > > The scenarios are: > Dry-run: do everything except actually sending IPI. Useful > to estimate system overhead. > Self-IPI: Send IPI to self CPU. > Normal IPI: Send IPI to some other CPU. > Broadcast IPI: Send broadcast IPI to all online CPUs. > Broadcast lock: Send broadcast IPI to all online CPUs and force them > acquire/release spinlock. > > The raw output looks like this: > [ 155.363374] Dry-run: 0, 2999696 ns > [ 155.429162] Self-IPI: 30385328, 65589392 ns > [ 156.060821] Normal IPI: 566914128, 631453008 ns > [ 158.384427] Broadcast IPI: 0, 2323368720 ns > [ 160.831850] Broadcast lock: 0, 2447000544 ns > > For virtualized guests, sending and reveiving IPIs causes guest exit. > I used this test to measure performance impact on KVM subsystem of > Christoffer Dall's series "Optimize KVM/ARM for VHE systems" [1]. > > Test machine is ThunderX2, 112 online CPUs. Below the results normalized > to host dry-run time, broadcast lock results omitted. Smaller - better. > > Host, v4.14: > Dry-run: 0 1 > Self-IPI: 9 18 > Normal IPI: 81 110 > Broadcast IPI: 0 2106 > > Guest, v4.14: > Dry-run: 0 1 > Self-IPI: 10 18 > Normal IPI: 305 525 > Broadcast IPI: 0 9729 > > Guest, v4.14 + [1]: > Dry-run: 0 1 > Self-IPI: 9 18 > Normal IPI: 176 343 > Broadcast IPI: 0 9885 > That looks handy. Peter and Ingo might be interested. I wonder if it should be in kernel/. Perhaps it's better to accumulate these things in lib/test_*.c, rather than cluttering up other top-level directories. > +static ktime_t __init send_ipi(int flags) > +{ > + ktime_t time = 0; > + DEFINE_SPINLOCK(lock); I have some vague historical memory that an on-stack spinlock can cause problems, perhaps with debugging code. Can't remember, maybe I dreamed it.