On Wed, Mar 15, 2017 at 09:35:29AM +0100, Michal Hocko wrote: > On Wed 15-03-17 01:14:27, Luis R. Rodriguez wrote: > > On Tue, Mar 14, 2017 at 11:07:38AM -0700, Darrick J. Wong wrote: > > > On Tue, Mar 14, 2017 at 05:57:45PM +0100, Luis R. Rodriguez wrote: > > > > On Tue, Mar 07, 2017 at 04:35:28PM -0800, Darrick J. Wong wrote: > > > > > The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses > > > > > it to grab 1-4 pages for staging of inobt records. The infinite loop in > > > > > the greedy allocation function is causing hangs[1] in generic/269, so > > > > > just get rid of the greedy allocator in favor of kmem_zalloc_large. > > > > > This makes bulkstat somewhat more likely to ENOMEM if there's really no > > > > > pages to spare, but eliminates a source of hangs. > > > > > > > > > > [1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.com > > > > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > --- > > > > > v2: remove single-page fallback > > > > > --- > > > > > > > > Since this fixes a hang how about *at the very least* a respective Fixes tag ? > > > > This fixes an existing hang so what are the stable considerations here ? I > > > > realize the answer is not easy but figured its worth asking. > > > > > > I didn't think it was appropriate to "Fixes: 77e4635ae1917" since we're > > > not fixing _greedy so much as we are killing it. The patch fixes an > > > infinite retry hang when bulkstat tries a memory allocation that cannot > > > be satisfied; and having done that, realizes there are no remaining > > > callers of _greedy and garbage collects it. The code that was there > > > before also seems capable of sleeping forever, I think. > > > > > > So the minimally invasive fix is to apply the allocation conversion in > > > bulkstat, and if there aren't any other callers of _greedy then you can > > > get rid of it too. > > > > For the stake of stable XFS users then why not do the less invasive change > > first, Cc stable, and then move on to the less backward portable solution ? > > The thing is that the permanent failures for vmalloc were so unlikely > prior to 5d17a73a2ebe ("vmalloc: back off when the current task is > killed") that this was basically a non-issue before this (4.11) merge > window. I see, this seems like critical information to add to the commit log. Also, will this be at least pushed to v4.11 ? Luis