I agree with Daniel, we should look into approach where pgdat_resize_lock is taken only for the duration of updating tracking values such as pgdat->first_deferred_pfn (perhaps we would need to add another tracker that would show chunks that are currently being worked on). The vast duration of struct page initialization process should happen outside of this lock, and only be taken when we update globally seen data structures: lists, tracking variables. This way we can solve several problems: 1. allow interrupt threads to grow zones if required. 2. keep jiffies happy. 3. allow future scaling when we will add inner node threads to initialize struct pages (i.e. ktasks from Daniel). Pasha On Thu, Mar 26, 2020 at 2:58 PM Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> wrote: > > On Thu, Mar 19, 2020 at 03:05:12PM -0400, Daniel Jordan wrote: > > Regardless, > > Reviewed-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> > > Darn, I spoke too soon. > > On a two-socket Xeon, smaller values of TICK_PAGE_COUNT caused the deferred > init timestamp to grow by over 25%. This was with pgdatinit0 bound to the > timer interrupt CPU to make sure the issue always reproduces. > > TICK_PAGE_COUNT node 0 deferred > init time (ms) > --------------- --------------- > 4096 610 > 8192 587 > 16384 487 > 32768 480 // used in the patch > > Instead of trying to find a constant that lets the timer interrupt run often > enough, I think a better way forward is to reconsider how we handle the resize > lock. I plan to prototype something and reply back with what I get.