On Wed, Aug 24, 2022 at 02:52:51PM -0700, Dan Williams wrote: > Shiyang Ruan wrote: > > This new function is a variant of mf_generic_kill_procs that accepts a > > file, offset pair instead of a struct to support multiple files sharing > > a DAX mapping. It is intended to be called by the file systems as part > > of the memory_failure handler after the file system performed a reverse > > mapping from the storage address to the file and file offset. > > > > Signed-off-by: Shiyang Ruan <ruansy.fnst@xxxxxxxxxxx> > > Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > Reviewed-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> > > --- > > include/linux/mm.h | 2 + > > mm/memory-failure.c | 96 ++++++++++++++++++++++++++++++++++++++++----- > > 2 files changed, 88 insertions(+), 10 deletions(-) > > Unfortunately my test suite was only running the "non-destructive" set > of 'ndctl' tests which skipped some of the complex memory-failure cases. > Upon fixing that, bisect flags this commit as the source of the following > crash regression: Thank you for testing/reporting. > > kernel BUG at mm/memory-failure.c:310! > invalid opcode: 0000 [#1] PREEMPT SMP PTI > CPU: 26 PID: 1252 Comm: dax-pmd Tainted: G OE 5.19.0-rc4+ #58 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > RIP: 0010:add_to_kill+0x304/0x400 > [..] > Call Trace: > <TASK> > collect_procs.part.0+0x2c8/0x470 > memory_failure+0x979/0xf30 > do_madvise.part.0.cold+0x9c/0xd3 > ? lock_is_held_type+0xe3/0x140 > ? find_held_lock+0x2b/0x80 > ? lock_release+0x145/0x2f0 > ? lock_is_held_type+0xe3/0x140 > ? syscall_enter_from_user_mode+0x20/0x70 > __x64_sys_madvise+0x56/0x70 > do_syscall_64+0x3a/0x80 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 This stacktrace shows that VM_BUG_ON_VMA() in dev_pagemap_mapping_shift() was triggered. I think that BUG_ON is too harsh here because address == -EFAULT means that there's no mapping for the address. The subsequent code considers "tk->size_shift == 0" as "no mapping" cases, so dev_pagemap_mapping_shift() can return 0 in such a case? Could the following diff work for the issue? diff --git a/mm/memory-failure.c b/mm/memory-failure.c --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -316,7 +316,8 @@ static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma, pmd_t *pmd; pte_t *pte; - VM_BUG_ON_VMA(address == -EFAULT, vma); + if (address == -EFAULT) + return 0; pgd = pgd_offset(vma->vm_mm, address); if (!pgd_present(*pgd)) return 0; @@ -390,7 +391,8 @@ static void add_to_kill(struct task_struct *tsk, struct page *p, if (tk->addr == -EFAULT) { pr_info("Unable to find user space address %lx in %s\n", page_to_pfn(p), tsk->comm); - } else if (tk->size_shift == 0) { + } + if (tk->size_shift == 0) { kfree(tk); return; } Thanks, Naoya Horiguchi > > This is from running: > > meson test -C build dax-ext4.sh > > ...from the ndctl repo. > > I will take look, and posting it here in case I do not find it tonight > and Ruan can take a look.