On Mon 30-01-23 09:16:13, Kefeng Wang wrote: > > > On 2023/1/30 5:48, Andrew Morton wrote: > > On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote: > > > > > As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"), > > > > Merged in 2017. > > > > > hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg > > > could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could > > > occurs a NULL pointer dereference, let's do not record the foreign > > > writebacks for folio memcg is null in mem_cgroup_track_foreign() to > > > fix it. > > > > > > Reported-by: Ma Wupeng <mawupeng1@xxxxxxxxxx> > > > Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing") > > > > Merged in 2019. > > > > > --- a/include/linux/memcontrol.h > > > +++ b/include/linux/memcontrol.h > > > @@ -1688,10 +1688,13 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct folio *folio, > > > static inline void mem_cgroup_track_foreign_dirty(struct folio *folio, > > > struct bdi_writeback *wb) > > > { > > > + struct mem_cgroup *memcg; > > > + > > > if (mem_cgroup_disabled()) > > > return; > > > - if (unlikely(&folio_memcg(folio)->css != wb->memcg_css)) > > > + memcg = folio_memcg(folio); > > > + if (unlikely(memcg && &memcg->css != wb->memcg_css)) > > > mem_cgroup_track_foreign_dirty_slowpath(folio, wb); > > > } > > > > Has this null deref actually been observed, or is this from code > > inspection? (This is why it's nice to include the Link: after a > > Reported-by!) > > It does occurs in our internal test and report by wupeng(based v5.10), > > BUG: KASAN: user-memory-access in > mem_cgroup_track_foreign_dirty_slowpath+0xc0/0x480 mm/memcontrol.c:4708 > Read of size 8 at addr 0000000000001000 by task syz-executor.2/28325 > > CPU: 2 PID: 28325 Comm: syz-executor.2 Tainted: G W > 5.10.0-03333-g48e46a146cbc-dirty #1 > Hardware name: linux,dummy-virt (DT) > Call trace: > ... > mem_cgroup_track_foreign_dirty_slowpath+0xc0/0x480 mm/memcontrol.c:4708 > mem_cgroup_track_foreign_dirty include/linux/memcontrol.h:1880 [inline] > account_page_dirtied+0x9a0/0xa90 mm/page-writeback.c:2436 > __set_page_dirty+0x1f8/0x4b0 fs/buffer.c:608 > __set_page_dirty_buffers+0x3d0/0x550 fs/buffer.c:668 > set_page_dirty+0x158/0x500 mm/page-writeback.c:2575 > filemap_page_mkwrite+0x3dc/0x490 mm/filemap.c:3224 > do_page_mkwrite+0xc4/0x3d0 mm/memory.c:2786 > wp_page_shared+0x14c/0x980 mm/memory.c:3118 > do_wp_page+0x930/0xbc0 mm/memory.c:3219 > handle_pte_fault+0x5e0/0x630 mm/memory.c:4570 > __handle_mm_fault+0x41c/0x910 mm/memory.c:4690 > handle_mm_fault+0x25c/0x484 mm/memory.c:4788 > __do_page_fault arch/arm64/mm/fault.c:440 [inline] > do_page_fault+0x3ac/0x9d4 arch/arm64/mm/fault.c:539 Just to make sure I understand. The page has been hwpoisoned, uncharged but stayed in the page cache so a next page fault on the address has blowned up? Say we address the NULL memcg case. What is the resulting behavior? Doesn't userspace access a poisoned page and get a silend memory corruption? -- Michal Hocko SUSE Labs