On 8/3/19 12:39 AM, Mike Kravetz wrote: > From: Vlastimil Babka <vbabka@xxxxxxx> > > Mike Kravetz reports that "hugetlb allocations could stall for minutes or hours > when should_compact_retry() would return true more often then it should. > Specifically, this was in the case where compact_result was COMPACT_DEFERRED > and COMPACT_PARTIAL_SKIPPED and no progress was being made." > > The problem is that the compaction_withdrawn() test in should_compact_retry() > includes compaction outcomes that are only possible on low compaction priority, > and results in a retry without increasing the priority. This may result in > furter reclaim, and more incomplete compaction attempts. > > With this patch, compaction priority is raised when possible, or > should_compact_retry() returns false. > > The COMPACT_SKIPPED result doesn't really fit together with the other outcomes > in compaction_withdrawn(), as that's a result caused by insufficient order-0 > pages, not due to low compaction priority. With this patch, it is moved to > a new compaction_needs_reclaim() function, and for that outcome we keep the > current logic of retrying if it looks like reclaim will be able to help. > > Reported-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> > Tested-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> There should be also your SOB, IIUC. > --- > include/linux/compaction.h | 22 +++++++++++++++++----- > mm/page_alloc.c | 16 ++++++++++++---- > 2 files changed, 29 insertions(+), 9 deletions(-) > > diff --git a/include/linux/compaction.h b/include/linux/compaction.h > index 9569e7c786d3..4b898cdbdf05 100644 > --- a/include/linux/compaction.h > +++ b/include/linux/compaction.h > @@ -129,11 +129,8 @@ static inline bool compaction_failed(enum compact_result result) > return false; > } > > -/* > - * Compaction has backed off for some reason. It might be throttling or > - * lock contention. Retrying is still worthwhile. > - */ > -static inline bool compaction_withdrawn(enum compact_result result) > +/* Compaction needs reclaim to be performed first, so it can continue. */ > +static inline bool compaction_needs_reclaim(enum compact_result result) > { > /* > * Compaction backed off due to watermark checks for order-0 > @@ -142,6 +139,16 @@ static inline bool compaction_withdrawn(enum compact_result result) > if (result == COMPACT_SKIPPED) > return true; > > + return false; > +} > + > +/* > + * Compaction has backed off for some reason after doing some work or none > + * at all. It might be throttling or lock contention. Retrying might be still > + * worthwhile, but with a higher priority if allowed. > + */ > +static inline bool compaction_withdrawn(enum compact_result result) > +{ > /* > * If compaction is deferred for high-order allocations, it is > * because sync compaction recently failed. If this is the case > @@ -207,6 +214,11 @@ static inline bool compaction_failed(enum compact_result result) > return false; > } > > +static inline bool compaction_needs_reclaim(enum compact_result result) > +{ > + return false; > +} > + > static inline bool compaction_withdrawn(enum compact_result result) > { > return true; > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index d3bb601c461b..af29c05e23aa 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3965,15 +3965,23 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags, > if (compaction_failed(compact_result)) > goto check_priority; > > + /* > + * compaction was skipped because there are not enough order-0 pages > + * to work with, so we retry only if it looks like reclaim can help. > + */ > + if (compaction_needs_reclaim(compact_result)) { > + ret = compaction_zonelist_suitable(ac, order, alloc_flags); > + goto out; > + } > + > /* > * make sure the compaction wasn't deferred or didn't bail out early > * due to locks contention before we declare that we should give up. > - * But do not retry if the given zonelist is not suitable for > - * compaction. > + * But the next retry should use a higher priority if allowed, so > + * we don't just keep bailing out endlessly. > */ > if (compaction_withdrawn(compact_result)) { > - ret = compaction_zonelist_suitable(ac, order, alloc_flags); > - goto out; > + goto check_priority; > } > > /* >