On Fri, May 24, 2019 at 04:26:16PM +0530, Anshuman Khandual wrote: > > > On 05/24/2019 02:50 PM, Suzuki K Poulose wrote: > > Hi, > > > > We are hitting NULL pointer dereferences while running stress tests with KVM. > > See splat [0]. The test is to spawn 100 VMs all doing standard debian > > installation (Thanks to Marc's automated scripts, available here [1] ). > > The problem has been reproduced with a better rate of success from 5.1-rc6 > > onwards. > > > > The issue is only reproducible with swapping enabled and the entire > > memory is used up, when swapping heavily. Also this issue is only reproducible > > on only one server with 128GB, which has the following memory layout: > > > > [32GB@4GB, hole , 96GB@544GB] > > > > Here is my non-expert analysis of the issue so far. > > > > Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() > > to figure out the cached values for migrate/free pfn for a zone, by scanning through > > the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ], > > with the following area of holes : [ 0x20_0000, 0x880_0000 ]. > > In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which > > is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2, > > with reset_migrate = 0x88_4e00, reset_free = 0x10_0000. > > > > Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, > > since we cant find anything during the search we fall back to using the page belonging > > to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid > > PFN or not. This is then passed on to fast_isolate_around() which tries to do : > > set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. > > > > The following patch seems to fix the issue for me, but I am not quite convinced that > > it is the right fix. Thoughts ? > > > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > index 9febc8c..9e1b9ac 100644 > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) > > page = pfn_to_page(highest); > > cc->free_pfn = highest; > > } else { > > - if (cc->direct_compaction) { > > + if (cc->direct_compaction && pfn_valid(min_pfn)) { > > page = pfn_to_page(min_pfn); > > pfn_to_online_page() here would be better as it does not add pfn_valid() cost on > architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But regardless if > the compaction is trying to scan pfns in zone holes, then it should be avoided. CONFIG_HOLES_IN_ZONE typically applies in special cases where an arch punches holes within a section. As both do a section lookup, the cost is similar but pfn_valid in general is less subtle in this case. Normally pfn_valid_within is only ok when a pfn_valid check has been made on the max_order aligned range as well as a zone boundary check. In this case, it's much more straight-forward to leave it as pfn_valid. -- Mel Gorman SUSE Labs _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm