On Thu 22-06-23 19:58:33, Tetsuo Handa wrote: > On 2023/06/22 16:18, Michal Hocko wrote: > >>> It is explained as the first deadlock scenario in commit 1007843a9190 > >>> ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock"). > >>> We have to disable IRQ before making zonelist_update_seq.seqcount odd. > >>> > >> > >> Since we must replace local_irq_save() + write_seqlock() with write_seqlock_irqsave() for > >> CONFIG_PREEMPT_RT=y case but we must not replace local_irq_save() + write_seqlock() with > >> write_seqlock_irqsave() for CONFIG_PREEMPT_RT=n case, the proper fix is something like below? > > > > Now, I am confused. Why write_seqlock_irqsave is not allowed for !RT? > > Let me quote the changelog and he scenario 1: > > write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd > > // e.g. timer interrupt handler runs at this moment > > some_timer_func() { > > kmalloc(GFP_ATOMIC) { > > __alloc_pages_slowpath() { > > read_seqbegin(&zonelist_update_seq) { > > // spins forever because zonelist_update_seq.seqcount is odd > > } > > } > > } > > } > > // e.g. timer interrupt handler finishes > > write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even > > > > This is clearly impossible with write_seqlock_irqsave as interrupts are > > disabled before the lock is taken. > > Well, it seems that "I don't want to replace" rather than "we must not replace". OK, so this is an alteranative fix rather the proposed fix being incorrect. > I reread the thread but I couldn't find why nobody suggested write_seqlock_irqsave(). > The reason I proposed the > > local_irq_save() => printk_deferred_enter() => write_seqlock() > > ordering implies a precaution in case write_seqlock() involves printk() (e.g. lockdep, > KCSAN, soft-lockup warning), in addition to "local_irq_save() before printk_deferred_enter()" > requirement. Maybe people in that thread were happy with preserving this precaution... Precaution is a fair argument. I am not sure it is the strongest one to justify the ugly RT special casing though. I would propose to go with Sebastian's patch as a clear fix and if you really care about the pre-caution then make sure you describe potential problems. > You commented > > There shouldn't be any other locks (apart from hotplug) taken in that path IIRC. > > at https://lkml.kernel.org/ZCrYQj+2/uMtqNBm@xxxxxxxxxxxxxx . > > If __build_all_zonelists() is already serialized by hotplug lock, we don't > need to call spin_lock(&zonelist_update_seq.lock) and we will be able to > replace write_seqlock(&zonelist_update_seq) with > write_seqcount_begin(&zonelist_update_seq.seqcount) like > cpuset_change_task_nodemask() does? Maybe, I haven't really dived into this deeper. One way or the other RT requires a special IRQ handling along with the seq lock, no? -- Michal Hocko SUSE Labs