[patch 091/119] mm, compaction: drain pcps for zone when kcompactd fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: David Rientjes <rientjes@xxxxxxxxxx>
Subject: mm, compaction: drain pcps for zone when kcompactd fails

It's possible for free pages to become stranded on per-cpu pagesets
(pcps) that, if drained, could be merged with buddy pages on the zone's
free area to form large order pages, including up to MAX_ORDER.

Consider a verbose example using the tools/vm/page-types tool at the
beginning of a ZONE_NORMAL ('B' indicates a buddy page and 'S'
indicates a slab page).  Pages on pcps do not have any page flags set.

109954  1       _______S________________________________________________________
109955  2       __________B_____________________________________________________
109957  1       ________________________________________________________________
109958  1       __________B_____________________________________________________
109959  7       ________________________________________________________________
109960  1       __________B_____________________________________________________
109961  9       ________________________________________________________________
10996a  1       __________B_____________________________________________________
10996b  3       ________________________________________________________________
10996e  1       __________B_____________________________________________________
10996f  1       ________________________________________________________________
...
109f8c  1       __________B_____________________________________________________
109f8d  2       ________________________________________________________________
109f8f  2       __________B_____________________________________________________
109f91  f       ________________________________________________________________
109fa0  1       __________B_____________________________________________________
109fa1  7       ________________________________________________________________
109fa8  1       __________B_____________________________________________________
109fa9  1       ________________________________________________________________
109faa  1       __________B_____________________________________________________
109fab  1       _______S________________________________________________________

The compaction migration scanner is attempting to defragment this
memory since it is at the beginning of the zone.  It has done so quite
well, all movable pages have been migrated.  From pfn [0x109955,
0x109fab), there are only buddy pages and pages without flags set.

These pages may be stranded on pcps that could otherwise allow this
memory to be coalesced if freed back to the zone free area.  It is
possible that some of these pages may not be on pcps and that something
has called alloc_pages() and used the memory directly, but we rely on
the absence of __GFP_MOVABLE in these cases to allocate from
MIGATE_UNMOVABLE pageblocks to try to keep these MIGRATE_MOVABLE
pageblocks as free as possible.

These buddy and pcp pages, spanning 1,621 pages, could be coalesced and
allow for three transparent hugepages to be dynamically allocated. 
Running the numbers for all such spans on the system, it was found that
there were over 400 such spans of only buddy pages and pages without
flags set at the time this /proc/kpageflags sample was collected. 
Without this support, there were _no_ order-9 or order-10 pages free.

When kcompactd fails to defragment memory such that a cc.order page can
be allocated, drain all pcps for the zone back to the buddy allocator
so this stranding cannot occur.  Compaction for that order will
subsequently be deferred, which acts as a ratelimit on this drain.

Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803010340100.88270@xxxxxxxxxxxxxxxxxxxxxxxxx
Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>
Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/compaction.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff -puN mm/compaction.c~mm-compaction-drain-pcps-for-zone-when-kcompactd-fails mm/compaction.c
--- a/mm/compaction.c~mm-compaction-drain-pcps-for-zone-when-kcompactd-fails
+++ a/mm/compaction.c
@@ -1988,6 +1988,14 @@ static void kcompactd_do_work(pg_data_t
 			compaction_defer_reset(zone, cc.order, false);
 		} else if (status == COMPACT_PARTIAL_SKIPPED || status == COMPACT_COMPLETE) {
 			/*
+			 * Buddy pages may become stranded on pcps that could
+			 * otherwise coalesce on the zone's free area for
+			 * order >= cc.order.  This is ratelimited by the
+			 * upcoming deferral.
+			 */
+			drain_all_pages(zone);
+
+			/*
 			 * We use sync migration mode here, so we defer like
 			 * sync direct compaction does.
 			 */
_
--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux