On Wed, Dec 11, 2024 at 3:55 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 12/11/24 11:53, Vlastimil Babka wrote: > > On 12/10/24 03:39, Alexei Starovoitov wrote: > >> From: Alexei Starovoitov <ast@xxxxxxxxxx> > >> > >> Similar to local_lock_irqsave() introduce local_trylock_irqsave(). > >> It uses spin_trylock in PREEMPT_RT and always succeeds when !RT. > > > > Hmm but is that correct to always succeed? If we're in an nmi, we might be > > preempting an existing local_(try)lock_irqsave() critical section because > > disabling irqs doesn't stop NMI's, right? > > So unless I'm missing something, it would need to be a new kind of local > lock to support this trylock operation on !RT? Ohh. Correct. Forgot about nmi interrupting local_lock_irqsave region in !RT. > Perhaps based on the same > principle of a simple active/locked flag that I tried in my sheaves RFC? [1] > There could be also the advantage that if all (potentially) irq contexts > (not just nmi) used trylock, it would be sufficient to disable preeemption > and not interrupts, which is cheaper. I don't think it's the case. pushf was slow on old x86. According to https://www.agner.org/optimize/instruction_tables.pdf it's 3 uops on skylake. That could be faster than preempt_disable (incl %gs:addr) which is 3-4 uops assuming cache hit. > The RT variant could work as you proposed here, that was wrong in my RFC as > you already pointed out when we discussed v1 of this series. > > [1] > https://lore.kernel.org/all/20241112-slub-percpu-caches-v1-5-ddc0bdc27e05@xxxxxxx/ I like your +struct local_tryirq_lock approach, but let's put it in local_lock.h ? and it probably needs local_inc_return() instead of READ/WRITE_ONCE. With irq and nmis it's racy. In the meantime I think I will fix below: > >> +#define __local_trylock_irqsave(lock, flags) \ > >> + ({ \ > >> + local_irq_save(flags); \ > >> + local_trylock_acquire(this_cpu_ptr(lock)); \ > >> + 1; \ > >> + }) as #define __local_trylock_irqsave(lock, flags) \ ({ \ local_irq_save(flags); \ local_trylock_acquire(this_cpu_ptr(lock)); \ !in_nmi(); \ }) I think that's good enough for memcg patch 4 and doesn't grow local_lock_t on !RT. We can introduce typedef struct { int count; #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; struct task_struct *owner; #endif } local_trylock_t; and the whole set of local_trylock_lock, local_trylock_unlock,... But that's quite a bit of code. Feels a bit overkill for memcg patch 4. At this point it feels that adding 'int count' to existing local_lock_t is cleaner.