On Thu 03-09-20 12:31:36, Andrew Morton wrote: > On Thu, 3 Sep 2020 19:36:26 +0200 David Hildenbrand <david@xxxxxxxxxx> wrote: > > > (still on vacation, back next week on Tuesday) > > > > I didn't look into discussions in v1, but to me this looks like we are > > trying to hide an actual bug by implementing hacks in the caller > > (repeated calls to drain_all_pages()). What about alloc_contig_range() > > users - you get more allocation errors just because PCP code doesn't > > play along. > > > > There *is* strong synchronization with the page allocator - however, > > there seems to be one corner case race where we allow to allocate pages > > from isolated pageblocks. > > > > I want that fixed instead if possible, otherwise this is just an ugly > > hack to make the obvious symptoms (offlining looping forever) disappear. > > > > If that is not possible easily, I'd much rather want to see all > > drain_all_pages() calls being moved to the caller and have the expected > > behavior documented instead of specifying "there is no strong > > synchronization with the page allocator" - which is wrong in all but PCP > > cases (and there only in one possible race?). > > > > It's a two-line hack which fixes a bug in -stable kernels, so I'm > inclined to proceed with it anyway. We can undo it later on as part of > a better fix, OK? Agreed. http://lkml.kernel.org/r/20200904070235.GA15277@xxxxxxxxxxxxxx for reference. -- Michal Hocko SUSE Labs