Re: [PATCH 0/2] fix for "pathological THP behavior"

Andrea Arcangeli <aarcange@xxxxxxxxxx> · Mon, 20 Aug 2018 11:19:05 -0400

Hi Kirill,

On Mon, Aug 20, 2018 at 02:58:18PM +0300, Kirill A. Shutemov wrote:
> I personally prefer to prioritize NUMA locality over THP
> (__GFP_COMPACT_ONLY variant), but I don't know page-alloc/compaction good
> enough to Ack it.

If we go in this direction it'd be nice after fixing the showstopper
bug, if we could then proceed with an orthogonal optimization by
checking the watermarks and if the watermarks shows there are no
PAGE_SIZEd pages available in the local node we should remove both
__GFP_THISNODE and __GFP_COMPACT_ONLY.

If as opposed there's still PAGE_SIZEd free memory in the local node
(not possible to compact for whatever reason), we should stick to
__GFP_THISNODE | __GFP_COMPACT_ONLY.

It's orthogonal because the above addition would make sense also in
the current (buggy) code.

The main implementation issue is that the watermark checking is not
well done in mempolicy.c but the place to clear __GFP_THISNODE and
__GFP_COMPACT_ONLY currently is there.

The case that the local node gets completely full and has not even 4k
pages available should be totally common, because if you keep
allocating and you allocate more than the size of a NUMA node
eventually you will fill the local node with THP then consume all 4k
pages and then you get into the case where the current code is totally
unable to allocate THP from the other nodes and it would be totally
possible to fix with the removal of __GFP_THISNODE |
__GFP_COMPACT_ONLY, after the PAGE_SIZE watermark check.

I'm mentioning this optimization in this context, even if it's
orthogonal, because the alternative patch that prioritizes THP over
NUMA locality for MADV_HUGEPAGE and defer=always would solve that too
with a one liner and there would be no need of watermark checking and
flipping gfp bits whatsoever. Once the local node is full, THPs keeps
being provided as expected.

Thanks,
Andrea