Re: [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty()

Michal Hocko <mhocko@xxxxxxxx> · Mon, 30 Jan 2023 09:48:45 +0100

On Mon 30-01-23 09:16:13, Kefeng Wang wrote:
> 
> 
> On 2023/1/30 5:48, Andrew Morton wrote:
> > On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote:
> > 
> > > As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
> > 
> > Merged in 2017.
> > 
> > > hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
> > > could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
> > > occurs a NULL pointer dereference, let's do not record the foreign
> > > writebacks for folio memcg is null in mem_cgroup_track_foreign() to
> > > fix it.
> > > 
> > > Reported-by: Ma Wupeng <mawupeng1@xxxxxxxxxx>
> > > Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
> > 
> > Merged in 2019.
> > 
> > > --- a/include/linux/memcontrol.h
> > > +++ b/include/linux/memcontrol.h
> > > @@ -1688,10 +1688,13 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct folio *folio,
> > >   static inline void mem_cgroup_track_foreign_dirty(struct folio *folio,
> > >   						  struct bdi_writeback *wb)
> > >   {
> > > +	struct mem_cgroup *memcg;
> > > +
> > >   	if (mem_cgroup_disabled())
> > >   		return;
> > > -	if (unlikely(&folio_memcg(folio)->css != wb->memcg_css))
> > > +	memcg = folio_memcg(folio);
> > > +	if (unlikely(memcg && &memcg->css != wb->memcg_css))
> > >   		mem_cgroup_track_foreign_dirty_slowpath(folio, wb);
> > >   }
> > 
> > Has this null deref actually been observed, or is this from code
> > inspection?  (This is why it's nice to include the Link: after a
> > Reported-by!)
> 
> It does occurs in our internal test and report by wupeng(based v5.10),
> 
> BUG: KASAN: user-memory-access in
> mem_cgroup_track_foreign_dirty_slowpath+0xc0/0x480 mm/memcontrol.c:4708
> Read of size 8 at addr 0000000000001000 by task syz-executor.2/28325
> 
> CPU: 2 PID: 28325 Comm: syz-executor.2 Tainted: G        W
> 5.10.0-03333-g48e46a146cbc-dirty #1
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  ...
>  mem_cgroup_track_foreign_dirty_slowpath+0xc0/0x480 mm/memcontrol.c:4708
>  mem_cgroup_track_foreign_dirty include/linux/memcontrol.h:1880 [inline]
>  account_page_dirtied+0x9a0/0xa90 mm/page-writeback.c:2436
>  __set_page_dirty+0x1f8/0x4b0 fs/buffer.c:608
>  __set_page_dirty_buffers+0x3d0/0x550 fs/buffer.c:668
>  set_page_dirty+0x158/0x500 mm/page-writeback.c:2575
>  filemap_page_mkwrite+0x3dc/0x490 mm/filemap.c:3224
>  do_page_mkwrite+0xc4/0x3d0 mm/memory.c:2786
>  wp_page_shared+0x14c/0x980 mm/memory.c:3118
>  do_wp_page+0x930/0xbc0 mm/memory.c:3219
>  handle_pte_fault+0x5e0/0x630 mm/memory.c:4570
>  __handle_mm_fault+0x41c/0x910 mm/memory.c:4690
>  handle_mm_fault+0x25c/0x484 mm/memory.c:4788
>  __do_page_fault arch/arm64/mm/fault.c:440 [inline]
>  do_page_fault+0x3ac/0x9d4 arch/arm64/mm/fault.c:539

Just to make sure I understand. The page has been hwpoisoned, uncharged
but stayed in the page cache so a next page fault on the address has blowned
up?

Say we address the NULL memcg case. What is the resulting behavior?
Doesn't userspace access a poisoned page and get a silend memory
corruption?
-- 
Michal Hocko
SUSE Labs