Hi Andrew, On Mon, Aug 21, 2017 at 01:26:00PM -0700, Andrew Morton wrote: > On Mon, 21 Aug 2017 23:08:18 +0800 Chen Yu <yu.c.chen@xxxxxxxxx> wrote: > > > There is a problem that when counting the pages for creating > > the hibernation snapshot will take significant amount of > > time, especially on system with large memory. Since the counting > > job is performed with irq disabled, this might lead to NMI lockup. > > The following warning were found on a system with 1.5TB DRAM: > > > > ... > > > > It has taken nearly 20 seconds(2.10GHz CPU) thus the NMI lockup > > was triggered. In case the timeout of the NMI watch dog has been > > set to 1 second, a safe interval should be 6590003/20 = 320k pages > > in theory. However there might also be some platforms running at a > > lower frequency, so feed the watchdog every 100k pages. > > > > ... > > > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2531,9 +2532,12 @@ void drain_all_pages(struct zone *zone) > > > > #ifdef CONFIG_HIBERNATION > > > > +/* Touch watchdog for every WD_INTERVAL_PAGE pages. */ > > +#define WD_INTERVAL_PAGE (100*1024) > > + > > void mark_free_pages(struct zone *zone) > > { > > - unsigned long pfn, max_zone_pfn; > > + unsigned long pfn, max_zone_pfn, page_num = 0; > > unsigned long flags; > > unsigned int order, t; > > struct page *page; > > @@ -2548,6 +2552,9 @@ void mark_free_pages(struct zone *zone) > > if (pfn_valid(pfn)) { > > page = pfn_to_page(pfn); > > > > + if (!((page_num++) % WD_INTERVAL_PAGE)) > > + touch_nmi_watchdog(); > > + > > if (page_zone(page) != zone) > > continue; > > > > @@ -2561,8 +2568,11 @@ void mark_free_pages(struct zone *zone) > > unsigned long i; > > > > pfn = page_to_pfn(page); > > - for (i = 0; i < (1UL << order); i++) > > + for (i = 0; i < (1UL << order); i++) { > > + if (!((page_num++) % WD_INTERVAL_PAGE)) > > + touch_nmi_watchdog(); > > swsusp_set_page_free(pfn_to_page(pfn + i)); > > + } > > } > > } > > spin_unlock_irqrestore(&zone->lock, flags); > > hm, is it really worth all the WD_INTERVAL_PAGE stuff? > touch_nmi_watchdog() is pretty efficient and calling it once-per-page > may not have a measurable effect. > We have version 1 of patch to feed the dog once-per-page. And we thought it might look more elegant if we feed the dog every N pages. > And if we're really concerned about the performance impact it would be > better to make WD_INTERVAL_PAGE a power of 2 (128*1024?) to avoid the > modulus operation. > Ok, I'll change the interval to 128*1024 then. Thanks, Yu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>