Re: [v5 1/2] mm: disable interrupts while initializing deferred pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 13 Mar 2018 15:45:46 -0400 Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> wrote:

> > > 
> > > We must remove cond_resched() because we can't sleep anymore. They were
> > > added to fight NMI timeouts, so I will replace them with
> > > touch_nmi_watchdog() in a follow-up fix.
> > 
> > This makes no sense.  Any code section where we can add cond_resched()
> > was never subject to NMI timeouts because that code cannot be running with
> > disabled interrupts.
> > 
> 
> Hi Andrew,
> 
> I was talking about this patch:
> 
> 9b6e63cbf85b89b2dbffa4955dbf2df8250e5375
> mm, page_alloc: add scheduling point to memmap_init_zone
> 
> Which adds cond_resched() to memmap_init_zone() to avoid NMI timeouts.
> 
> memmap_init_zone() is used both, early in boot, when non-deferred struct
> pages are initialized, but also may be used later, during memory hotplug.
> 
> As I understand, the later case could cause the timeout on non-preemptible
> kernels.
> 
> My understanding, is that the same logic was used here when cond_resched()s
> were added.
> 
> Please correct me if I am wrong.

Yes, the message is a bit confusing and the terminology is perhaps
vague.  And it's been a while since I played with this stuff, so from
(dated) memory:

Soft lockup: kernel has run for too long without rescheduling
Hard lockup: kernel has run for too long with interrupts disabled

Both of these are detected by the NMI watchdog handler.

9b6e63cbf85b89b2d fixes a soft lockup by adding a manual rescheduling
point.  Replacing that with touch_nmi_watchdog() won't work (I think). 
Presumably calling touch_softlockup_watchdog() will "work", in that it
suppresses the warning.  But it won't fix the thing which the warning
is actually warning about: starvation of the CPU scheduler.  That's
what the cond_resched() does.

I'm not sure what to suggest, really.  Your changelog isn't the best:
"Vlastimil Babka reported about a window issue during which when
deferred pages are initialized, and the current version of on-demand
initialization is finished, allocations may fail".  Well...  where is
ths mysterious window?  Without such detail it's hard for others to
suggest alternative approaches.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux