Hi Vlastimil, On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote: > On 09/28/2016 03:41 AM, Johannes Weiner wrote: > > Hi guys, > > > > we noticed what looks like a regression in page mobility grouping > > during an upgrade from 3.10 to 4.0. Identical machines, workloads, and > > uptime, but /proc/pagetypeinfo on 3.10 looks like this: > > > > Number of blocks type Unmovable Reclaimable Movable Reserve Isolate > > Node 1, zone Normal 815 433 31518 2 0 > > > > and on 4.0 like this: > > > > Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate > > Node 1, zone Normal 3880 3530 25356 2 0 0 > > It's worth to keep in mind that this doesn't reflect where the actual > unmovable pages reside. It might be that in 3.10 they are spread within > the movable pages. IIRC enabling page_owner (not sure if in 4.0, there > were some later fixes I think) can augment pagetypeinfo with at least > some statistics of polluted pageblocks. Thanks, I'll look at the mixed block counts. I failed to make clear, we saw that issue in the switch from 3.10 to 4.0, and I mentioned those two kernels as last known good / first known bad. But later kernels - we tried with 4.6 - look the same. This appears to be a regression in (higher-order) allocation service quality somewhere after 3.10 that persists into current kernels. > Does e.g. /proc/meminfo suggest how much unmovable/reclaimable memory > there should be allocated and if it would fill the respective > pageblocks, or if they are poorly utilized? They are very poorly utilized. On a machine with 90% anon/cache pages alone we saw 50% of the page blocks unmovable. > > 4.0 is either polluting pageblocks more aggressively at allocation, or > > is not able to make pageblocks movable again when the reclaimable and > > unmovable allocations are released. Invoking compaction manually > > (/proc/sys/vm/compact_memory) is not bringing them back, either. > > > > The problem we are debugging is that these machines have a very high > > rate of order-3 allocations (fdtable during fork, network rx), and > > after the upgrade allocstalls have increased dramatically. I'm not > > entirely sure this is the same issue, since even order-0 allocations > > are struggling, but the mobility grouping in itself looks problematic. > > > > I'm still going through the changes relevant to mobility grouping in > > that timeframe, but if this rings a bell for anyone, it would help. I > > hate blaming random patches, but these caught my eye: > > > > 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations > > 3a1086f mm: always steal split buddies in fallback allocations > > 99592d5 mm: when stealing freepages, also take pages created by splitting buddy page > > Check also the changelogs for mentions of earlier commits, e.g. 99592d5 > should be restoring behavior that changed in 3.12-3.13 and you are > upgrading from 3.10. Good point. > > The changelog states that by aggressively stealing split buddy pages > > during a fallback allocation we avoid subsequent stealing. But since > > there are generally more movable/reclaimable pages available, and so > > less falling back and stealing freepages on behalf of movable, won't > > this mean that we could expect exactly that result - growing numbers > > of unmovable blocks, while rarely stealing them back in movable alloc > > fallbacks? And the expansion of !MOVABLE blocks would over time make > > compaction less and less effective too, seeing as it doesn't consider > > anything !MOVABLE suitable migration targets? > > Yeah this is an issue with compaction that was brought up recently and I > want to tackle next. Agreed, it would be nice if compaction could reclaim unmovable and reclaimable blocks whose polluting allocations have since been freed. But there is a limit to how lazy mobility grouping can be and still expect compaction to fix it up. If 50% of the page blocks are marked unmovable, we don't pack incoming polluting allocations. When spread out the right way, even just a few of those can have a devastating impact on overall compactability. So regardless of future compaction improvements, we need to get anti-frag accuracy in the allocator closer to 3.10 levels again. > > Attached are the full /proc/pagetypeinfo and /proc/buddyinfo from both > > kernels on machines with similar uptimes and directly after invoking > > compaction. As you can see, the buddy lists are much more fragmented > > on 4.0, with unmovable/reclaimable allocations polluting more blocks. > > > > Any thoughts on this would be greatly appreciated. I can test patches. > > I guess testing revert of 9c0415e could give us some idea. Commit > 3a1086f shouldn't result in pageblock marking differences and as I said > above, 99592d5 should be just restoring to what 3.10 did. I can give this a shot, but note that this commit makes only unmovable stealing more aggressive. We see reclaimable blocks up as well. The workload is fairly variable, so it'll take about a day to smooth out a meaningful average. Thanks for your insights, Vlastimil! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>