Re: [Bug 189181] New: BUG: unable to handle kernel NULL pointer dereference in mem_cgroup_node_nr_lru_pages

Johannes Weiner <hannes@xxxxxxxxxxx> · Wed, 30 Nov 2016 13:16:53 -0500

Hi Michael,

On Wed, Nov 30, 2016 at 06:00:40PM +0100, Michal Hocko wrote:
> > > [   15.665196] BUG: unable to handle kernel NULL pointer dereference at
> > > 0000000000000400
> > > [   15.665213] IP: [<ffffffff8122d520>] mem_cgroup_node_nr_lru_pages+0x20/0x40
> > > [   15.665225] PGD 0 
> > > [   15.665230] Oops: 0000 [#1] SMP
> > > [   15.665235] Modules linked in: fuse xt_nat xen_netback xt_REDIRECT
> > > nf_nat_redirect ip6table_filter ip6_tables xt_conntrack ipt_MASQUERADE
> > > nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_i
> > > pv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack intel_rapl
> > > x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel
> > > ghash_clmulni_intel pcspkr dummy_hcd udc_core u2mfn(O) 
> > > xen_blkback xenfs xen_privcmd xen_blkfront
> > > [   15.665285] CPU: 0 PID: 60 Comm: kswapd0 Tainted: G           O   
> > > 4.8.10-12.pvops.qubes.x86_64 #1
> > > [   15.665292] task: ffff880011863b00 task.stack: ffff880011868000
> > > [   15.665297] RIP: e030:[<ffffffff8122d520>]  [<ffffffff8122d520>]
> > > mem_cgroup_node_nr_lru_pages+0x20/0x40
> > > [   15.665307] RSP: e02b:ffff88001186bc70  EFLAGS: 00010293
> > > [   15.665311] RAX: 0000000000000000 RBX: ffff88001186bd20 RCX:
> > > 0000000000000002
> > > [   15.665317] RDX: 000000000000000c RSI: 0000000000000000 RDI:
> > > 0000000000000000
> 
> I cannot generate a similar code to yours but the above suggests that we
> are getting NULL memcg. This would suggest a global reclaim and
> count_shadow_nodes misinterprets that because it does
> 
> 	if (memcg_kmem_enabled()) {
> 		pages = mem_cgroup_node_nr_lru_pages(sc->memcg, sc->nid,
> 						     LRU_ALL_FILE);
> 	} else {
> 		pages = node_page_state(NODE_DATA(sc->nid), NR_ACTIVE_FILE) +
> 			node_page_state(NODE_DATA(sc->nid), NR_INACTIVE_FILE);
> 	}
> 
> this might be a race with kmem enabling AFAICS. Anyaway I believe that
> the above check needs to ne extended for the sc->memcg != NULL

Yep, my locally built code looks very different from the report, but
it's clear that memcg is NULL. I didn't see the race you mention, but
it makes sense to me: shrink_slab() is supposed to filter memcg-aware
shrinkers based on whether we have a memcg or not, but it only does it
when kmem accounting is enabled; if it's disabled, the shrinker should
also use its non-memcg behavior. However, nothing prevents a memcg
with kmem from onlining between the filter and the shrinker run.

> diff --git a/mm/workingset.c b/mm/workingset.c
> index 617475f529f4..0f07522c5c0e 100644
> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -348,7 +348,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
>  	shadow_nodes = list_lru_shrink_count(&workingset_shadow_nodes, sc);
>  	local_irq_enable();
>  
> -	if (memcg_kmem_enabled()) {
> +	if (memcg_kmem_enabled() && sc->memcg) {
>  		pages = mem_cgroup_node_nr_lru_pages(sc->memcg, sc->nid,
>  						     LRU_ALL_FILE);
>  	} else {

If we do that, I'd remove the racy memcg_kmem_enabled() check
altogether and just check for whether we have a memcg or not.

What do you think, Vladimir?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>