Re: Kernel Benchmarking

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Mon, 14 Sep 2020 13:53:28 -0700

On Mon, Sep 14, 2020 at 1:21 PM Matthieu Baerts
<matthieu.baerts@xxxxxxxxxxxx> wrote:
>
> Recently, some of these packetdrill tests have been failing after 2
> minutes (timeout) instead of being executed in a few seconds (~6
> seconds). No packets are even exchanged during these two minutes.

Hmm.

That sounds like a deadlock to me, and sounds like it's a latent bug
waiting to happen.

One way I can see that happening (with the fair page locking) is to do
something like this

thread A does:
  lock_page()
    do something

thread B:
  lock_page - ends up blocking on the lock

thread A continue:
   unlock_page() - for the fair case this now transfers the page lock
to thread B
   .. do more work
   lock_page() - this now blocks because B already owns the lock

thread B continues:
  do something that requires A to have continued, but A is blocked on
B, and we have a classic ABBA deadlock

and the only difference here is that with the unfair locks, thread A
would get the page lock and finish whatever it did, and you'd never
see the deadlock.

And by "never" I mean "very very seldom". That's why it sounds like a
latent bug to me - the fact that it triggers with the fair locks
really makes me suspect that it *could* have triggered with the unfair
locks, it just never really did, because we didn't have that
synchronous lock transfer to the waiter.

One of the problems with the page lock is that it's a very very
special lock, and afaik has never worked with lockdep. So we have
absolutely _zero_ coverage of even the simplest ABBA deadlocks with
the page lock.

> I would be glad to help by validating new modifications or providing new
> info. My setup is also easy to put in place: a Docker image is built
> with all required tools to start the same VM just like the one I have.

I'm not familiar enough with packetdrill or any of that infrastructure
- does it do its own kernel modules etc for the packet latency
testing?

But it sounds like it's 100% repeatable with the fair page lock, which
is actually a good thing. It means that if you do a "sysrq-w" while
it's blocking, you should see exactly what is waiting for what.

(Except since it times out nicely eventually, probably at least part
of the waiting is interruptible, and then you need to do "sysrq-t"
instead and it's going to be _very_ verbose and much harder to
pinpoint things, and you'll probably need to have a very big printk
buffer).

There are obviously other ways to do it too - kgdb or whatever - which
you may or may not be more used to.

But sysrq is very traditional and often particularly easy if it's a
very repeatable "things are hung". Not nearly as good as lockdep, of
course. But if the machine is otherwise working, you can just do

    echo 'w' > /proc/sysrq-trigger

in another terminal (and again, maybe you need 't', but then you
really want to do it *without* having a full GUI setup or anythign
like that, to at least make it somewhat less verbose).

Aside: a quick google shows that Nick Piggin did try to extend lockdep
to the page lock many many years ago. I don't think it ever went
anywhere. To quote Avril Lavigne: "It's complicated".

                 Linus