On Tue, 13 Mar 2018 15:45:46 -0400 Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> wrote: > > > > > > We must remove cond_resched() because we can't sleep anymore. They were > > > added to fight NMI timeouts, so I will replace them with > > > touch_nmi_watchdog() in a follow-up fix. > > > > This makes no sense. Any code section where we can add cond_resched() > > was never subject to NMI timeouts because that code cannot be running with > > disabled interrupts. > > > > Hi Andrew, > > I was talking about this patch: > > 9b6e63cbf85b89b2dbffa4955dbf2df8250e5375 > mm, page_alloc: add scheduling point to memmap_init_zone > > Which adds cond_resched() to memmap_init_zone() to avoid NMI timeouts. > > memmap_init_zone() is used both, early in boot, when non-deferred struct > pages are initialized, but also may be used later, during memory hotplug. > > As I understand, the later case could cause the timeout on non-preemptible > kernels. > > My understanding, is that the same logic was used here when cond_resched()s > were added. > > Please correct me if I am wrong. Yes, the message is a bit confusing and the terminology is perhaps vague. And it's been a while since I played with this stuff, so from (dated) memory: Soft lockup: kernel has run for too long without rescheduling Hard lockup: kernel has run for too long with interrupts disabled Both of these are detected by the NMI watchdog handler. 9b6e63cbf85b89b2d fixes a soft lockup by adding a manual rescheduling point. Replacing that with touch_nmi_watchdog() won't work (I think). Presumably calling touch_softlockup_watchdog() will "work", in that it suppresses the warning. But it won't fix the thing which the warning is actually warning about: starvation of the CPU scheduler. That's what the cond_resched() does. I'm not sure what to suggest, really. Your changelog isn't the best: "Vlastimil Babka reported about a window issue during which when deferred pages are initialized, and the current version of on-demand initialization is finished, allocations may fail". Well... where is ths mysterious window? Without such detail it's hard for others to suggest alternative approaches.