Re: [PATCH] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 22-06-23 19:58:33, Tetsuo Handa wrote:
> On 2023/06/22 16:18, Michal Hocko wrote:
> >>> It is explained as the first deadlock scenario in commit 1007843a9190
> >>> ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock").
> >>> We have to disable IRQ before making zonelist_update_seq.seqcount odd.
> >>>
> >>
> >> Since we must replace local_irq_save() + write_seqlock() with write_seqlock_irqsave() for
> >> CONFIG_PREEMPT_RT=y case but we must not replace local_irq_save() + write_seqlock() with
> >> write_seqlock_irqsave() for CONFIG_PREEMPT_RT=n case, the proper fix is something like below?
> > 
> > Now, I am confused. Why write_seqlock_irqsave is not allowed for !RT?
> > Let me quote the changelog and he scenario 1:
> >         write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
> >         // e.g. timer interrupt handler runs at this moment
> >           some_timer_func() {
> >             kmalloc(GFP_ATOMIC) {
> >               __alloc_pages_slowpath() {
> >                 read_seqbegin(&zonelist_update_seq) {
> >                   // spins forever because zonelist_update_seq.seqcount is odd
> >                 }
> >               }
> >             }
> >           }
> >         // e.g. timer interrupt handler finishes
> >         write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
> > 
> > This is clearly impossible with write_seqlock_irqsave as interrupts are
> > disabled before the lock is taken.
> 
> Well, it seems that "I don't want to replace" rather than "we must not replace".

OK, so this is an alteranative fix rather the proposed fix being
incorrect.

> I reread the thread but I couldn't find why nobody suggested write_seqlock_irqsave().
> The reason I proposed the
> 
>   local_irq_save() => printk_deferred_enter() => write_seqlock()
> 
> ordering implies a precaution in case write_seqlock() involves printk() (e.g. lockdep,
> KCSAN, soft-lockup warning), in addition to "local_irq_save() before printk_deferred_enter()"
> requirement. Maybe people in that thread were happy with preserving this precaution...

Precaution is a fair argument. I am not sure it is the strongest one to
justify the ugly RT special casing though. I would propose to go with 
Sebastian's patch as a clear fix and if you really care about the
pre-caution then make sure you describe potential problems.

> You commented
> 
>   There shouldn't be any other locks (apart from hotplug) taken in that path IIRC.
> 
> at https://lkml.kernel.org/ZCrYQj+2/uMtqNBm@xxxxxxxxxxxxxx .
> 
> If __build_all_zonelists() is already serialized by hotplug lock, we don't
> need to call spin_lock(&zonelist_update_seq.lock) and we will be able to
> replace write_seqlock(&zonelist_update_seq) with
> write_seqcount_begin(&zonelist_update_seq.seqcount) like
> cpuset_change_task_nodemask() does?

Maybe, I haven't really dived into this deeper. One way or the other
RT requires a special IRQ handling along with the seq lock, no?

-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux