Re: [PATCH bpf-next v2 1/6] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation

Steven Rostedt <rostedt@xxxxxxxxxxx> · Fri, 13 Dec 2024 16:00:35 -0500

On Fri, 13 Dec 2024 15:09:50 -0500
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> > What's the concern then? That PI may see an odd order of locks for this task ?
> > but it cannot do anything about it anyway, since the task won't schedule.
> > And before irq handler is over the B will be released and everything
> > will look normal again.  
> 
> The problem is the chain walk. It could also cause unwanted side effects in RT.
> 
> If low priority task 1 has lock A and is running on another CPU and low
> priority task 2 blocks on lock A and then is interrupted right before going
> to sleep as being "blocked on", and takes lock B in the interrupt context.
> We then have high priority task 3 on another CPU block on B which will then
> see that the owner of B is blocked (even though it is not blocked for B), it
> will boost its priority as well as the owner of the lock (A). The A owner
> will get boosted where it is not the task that is blocking the high
> priority task.
> 
> My point is that RT is all about deterministic behavior. It would require
> a pretty substantial audit to the PI logic to make sure that this doesn't
> cause any unexpected results.
> 
> My point is, the PI logic was not designed for taking a lock after being
> blocked on another lock. It may work, but we had better prove and show all
> side effects that can happen in these cases.

When B is released, task 2 will be unboosted, but task 1 will not. That's
because a task is only unboosted when it releases a lock (or a lock times
out, and then the waiter will unboost the chain, but that's not this case).

Task 1 will unboost when it finally releases lock A.

Another issue is that interrupts are not stopped by spin_lock_irq*() as in
RT spin_lock_irq*() is the same as spin_lock(). As spin_locks() are assumed
to only be in threaded irq context, there's no reason to actually disable
interrupts when taking one of these spin_lock_irq*().

That means we could have task 1 trying to take lock B too. If we have a
lock order of:

  A -> B

Where B is the trylock in the irq context, we have:

  CPU 1			CPU 2			CPU 3
  -----			-----			-----
 task 1 takes A
			task 2 blocks on A, gets interrupted, trylock B and succeeds:

 task 1 takes B (blocks)

						High priority task 3 blocks on B

Now we have this in the PI chain:

  Task 3 boosts tasks 2 but task 2 is blocked on A
  Task 3 then boosts owner of A which is task 1 which is blocked on lock B
  Task 3 then boosts owner of lock B which is task 2
  [ wash, rinse, repeat!!! ]

There is a safety valve in the code that will prevent an infinite loop, and
it will trigger a printk message or it may stop because the priorities are
equal, but still, this is not desirable and may even have other side
effects. If deadlock detection is enabled, this will trigger it.

Again, allowing spin_locks being taken in hard irq context in RT, even with
trylock is going to open a nasty can of worms that will make this less
deterministic and determinism is the entire point of RT. If we allow one
user to have spin_lock_trylock() in hard irq context, we have to allow
anyone to do it.

-- Steve