On 1/4/19 1:50 PM, Mel Gorman wrote: > Pageblocks are marked for skip when no pages are isolated after a scan. > However, it's possible to hit corner cases where the migration scanner > gets stuck near the boundary between the source and target scanner. Due > to pages being migrated in blocks of COMPACT_CLUSTER_MAX, pages that > are migrated can be reallocated before the pageblock is complete. The > pageblock is not necessarily skipped so it can be rescanned multiple > times. Similarly, a pageblock with some dirty/writeback pages may fail > to isolate and be rescanned until writeback completes which is wasteful. ^ migrate? If we failed to isolate, then it wouldn't bump nr_isolated. Wonder if we could do better checks and not isolate pages that cannot be at the moment migrated anyway. > > This patch tracks if a pageblock is being rescanned. If so, then the entire > pageblock will be migrated as one operation. This narrows the race window > during which pages can be reallocated during migration. Secondly, if there > are pages that cannot be isolated then the pageblock will still be fully > scanned and marked for skipping. On the second rescan, the pageblock skip > is set and the migration scanner makes progress. > > 4.20.0 4.20.0 > finishscan-v2r15 norescan-v2r15 > Amean fault-both-3 3729.80 ( 0.00%) 2872.13 * 23.00%* > Amean fault-both-5 5148.49 ( 0.00%) 4330.56 * 15.89%* > Amean fault-both-7 7393.24 ( 0.00%) 6496.63 ( 12.13%) > Amean fault-both-12 11709.32 ( 0.00%) 10280.59 ( 12.20%) > Amean fault-both-18 16626.82 ( 0.00%) 11079.19 * 33.37%* > Amean fault-both-24 19944.34 ( 0.00%) 17207.80 * 13.72%* > Amean fault-both-30 23435.53 ( 0.00%) 17736.13 * 24.32%* > Amean fault-both-32 23948.70 ( 0.00%) 18509.41 * 22.71%* > > 4.20.0 4.20.0 > finishscan-v2r15 norescan-v2r15 > Percentage huge-1 0.00 ( 0.00%) 0.00 ( 0.00%) > Percentage huge-3 88.39 ( 0.00%) 96.87 ( 9.60%) > Percentage huge-5 92.07 ( 0.00%) 94.63 ( 2.77%) > Percentage huge-7 91.96 ( 0.00%) 93.83 ( 2.03%) > Percentage huge-12 93.38 ( 0.00%) 92.65 ( -0.78%) > Percentage huge-18 91.89 ( 0.00%) 93.66 ( 1.94%) > Percentage huge-24 91.37 ( 0.00%) 93.15 ( 1.95%) > Percentage huge-30 92.77 ( 0.00%) 93.16 ( 0.42%) > Percentage huge-32 87.97 ( 0.00%) 92.58 ( 5.24%) > > The fault latency reduction is large and while the THP allocation > success rate is only slightly higher, it's already high at this > point of the series. > > Compaction migrate scanned 60718343.00 31772603.00 > Compaction free scanned 933061894.00 63267928.00 Hm I thought the order of magnitude difference between migrate and free scanned was already gone at this point as reported in the previous 2 patches. Or is this from different system/configuration? Anyway, encouraging result. I would expect that after "Keep migration source private to a single compaction instance" sets the skip bits much more early and aggressively, the rescans would not happen anymore thanks to those, even if cached pfns were not updated. > Migration scan rates are reduced by 48% and free scan rates are > also reduced as the same migration source block is not being selected > multiple times. The corner case where migration scan rates go through the > roof due to a dirty/writeback pageblock located at the boundary of the > migration/free scanner did not happen in this case. When it does happen, > the scan rates multiple by factors measured in the hundreds and would be > misleading to present. > > Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> Acked-by: Vlastimil Babka <vbabka@xxxxxxx>