Re: [PATCH v3 6/6] pid: drop irq disablement around pidmap_lock

David Laight <david.laight.linux@xxxxxxxxx> · Sun, 2 Feb 2025 20:44:49 +0000

On Sun, 2 Feb 2025 20:34:48 +0100
Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:

> On Sun, Feb 2, 2025 at 2:55 PM David Laight
> <david.laight.linux@xxxxxxxxx> wrote:
> >
> > On Sat, 1 Feb 2025 22:00:06 +0000
> > Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >  
> > > On Sat, Feb 01, 2025 at 09:51:05PM +0000, David Laight wrote:  
> > > > I'm not sure what you mean.
> > > > Disabling interrupts isn't as cheap as it ought to be, but probably isn't
> > > > that bad.  
> > >
> > > Time it.  You'll see.  
> >
> > The best scheme I've seen is to just increment a per-cpu value.
> > Let the interrupt happen, notice it isn't allowed and return with
> > interrupts disabled.
> > Then re-issue the interrupt when the count is decremented to zero.
> > Easy with level sensitive interrupts.
> > But I don't think Linux ever uses that scheme.
> >  
> 
> I presume you are talking about the splhigh/splx set of primivitives
> from Unix kernels.
> 
> While "entering" is indeed cheap, undoing the work still needs to be
> atomic vs interrupts.
> 
> I see NetBSD uses local cmpxchg8b on the interrupt level and interrupt
> mask, while the rest takes the irq trip.
> 
> The NetBSD solution is still going to be visibly slower than not
> messing with any of it as spin_unlock on amd64 is merely a store of 0
> and cmpxchg even without the lock prefix costs several clocks.
> 
> Maybe there is other hackery which could be done, but see below.

I was thinking it might be possible to merge an 'interrupts disabled' count
with the existing 'pre-emption disabled' count.
IIRC (on x86 at least) this is just a per-cpu variabled accessed from %fs/%gs.
So you add one to disable pre-emption and (say) 1<<16 to disable interrupts.
If an interrupt happens while the count is 'big' the count value is changed
so the last decrement of 1<<16 will set carry (or overflow), and a return
from interrupt is done that leaves interrupts disabled (traditionally easy).
The interrupt enable call just subtracts the 1<<16 and checks for carry (or
overflow), if not set all is fine, it set it needs to call something to
re-issue the interrupt - that is probably the hard bit.

> 
> >  
> > > > > So while this is indeed a tradeoff, as I understand the sane default
> > > > > is to *not* disable interrupts unless necessary.  
> > > >
> > > > I bet to differ.  
> > >
> > > You're wrong.  It is utterly standard to take spinlocks without
> > > disabling IRQs.  We do it all over the kernel.  If you think that needs
> > > to change, then make your case, don't throw a driveby review.
> > >
> > > And I don't mean by arguing.  Make a change, measure the difference.  
> >
> > The analysis was done on some userspace code that basically does:
> >         for (;;) {
> >                 pthread_mutex_enter(lock);
> >                 item = get_head(list);
> >                 if (!item)
> >                         break;
> >                 pthead_mutex_exit(lock);
> >                 process(item);
> >         }
> > For the test there were about 10000 items on the list and 30 threads
> > processing it (that was the target of the tests).
> > The entire list needs to be processed in 10ms (RTP audio).
> > There was a bit more code with the mutex held, but only 100 or so
> > instructions.
> > Mostly it works fine, some threads get delayed by interrupts (etc) but
> > the other threads carry on working and all the items get processed.
> >
> > However sometimes an interrupt happens while the mutex is held.
> > In that case the other 29 threads get stuck waiting for the mutex.
> > No progress is made until the interrupt completes and it overruns
> > the 10ms period.
> >
> > While this is a userspace test, the same thing will happen with
> > spin locks in the kernel.
> >
> > In userspace you can't disable interrupts, but for kernel spinlocks
> > you can.
> >
> > The problem is likely to show up as unexpected latency affecting
> > code with a hot mutex that is only held for short periods while
> > running a lot of network traffic.
> > That is also latency that affects all cpu at the same time.
> > The interrupt itself will always cause latency to one cpu.
> >  
> 
> Nobody is denying there is potential that lock hold time will get
> significantly extended if you get unlucky enough vs interrupts. It is
> questioned whether defaulting to irqs off around lock-protected areas
> is the right call.

I really commented because you were changing one lock which could
easily be 'hot' enough for there to be side effects, without even
a comment about any pitfalls.

	David

> 
> As I noted in my previous e-mail the spin_lock_irq stuff disables
> interrupts upfront and does not touch them afterwards even when
> waiting for the lock to become free. Patching that up with queued
> locks may be non-trivial, if at all possible. Thus contention on
> irq-disabled locks *will* add latency to their handling unless this
> gets addressed. Note maintaining forward progress guarantee in the
> locking mechanism is non-negotiable, so punting to TAS or similar
> unfair locks does not cut it.
> 
> This is on top of having to solve the overhead problem for taking the
> trips (see earlier in the e-mail).
> 
> I would argue if the network stuff specifically is known to add
> visible latency, then perhaps that's something to investigate.
> 
> Anyhow, as Willy said, you are welcome to code this up and demonstrate
> it is better overall.
>