Re: mm/compaction: BUG: NULL pointer dereference

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 05/24/2019 02:50 PM, Suzuki K Poulose wrote:
> Hi,
> 
> We are hitting NULL pointer dereferences while running stress tests with KVM.
> See splat [0]. The test is to spawn 100 VMs all doing standard debian
> installation (Thanks to Marc's automated scripts, available here [1] ).
> The problem has been reproduced with a better rate of success from 5.1-rc6
> onwards.
> 
> The issue is only reproducible with swapping enabled and the entire
> memory is used up, when swapping heavily. Also this issue is only reproducible
> on only one server with 128GB, which has the following memory layout:
> 
> [32GB@4GB, hole , 96GB@544GB]
> 
> Here is my non-expert analysis of the issue so far.
> 
> Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable()
> to figure out the cached values for migrate/free pfn for a zone, by scanning through
> the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ],
> with the following area of holes : [ 0x20_0000, 0x880_0000 ].
> In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which
> is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2,
> with reset_migrate = 0x88_4e00, reset_free = 0x10_0000.
> 
> Now these cached values are used by the fast_isolate_freepages() to find a pfn. However,
> since we cant find anything during the search we fall back to using the page belonging
> to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid
> PFN or not. This is then passed on to fast_isolate_around() which tries to do :
> set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer.
> 
> The following patch seems to fix the issue for me, but I am not quite convinced that
> it is the right fix. Thoughts ?
> 
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 9febc8c..9e1b9ac 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc)
>  				page = pfn_to_page(highest);
>  				cc->free_pfn = highest;
>  			} else {
> -				if (cc->direct_compaction) {
> +				if (cc->direct_compaction && pfn_valid(min_pfn)) {
>  					page = pfn_to_page(min_pfn);

pfn_to_online_page() here would be better as it does not add pfn_valid() cost on
architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But regardless if
the compaction is trying to scan pfns in zone holes, then it should be avoided.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux