On Mon, Jul 17, 2023 at 10:33:14AM +0800, Miaohe Lin wrote: > On 2023/7/15 11:50, Matthew Wilcox wrote: > > On Sat, Jul 15, 2023 at 11:17:26AM +0800, Miaohe Lin wrote: > >> Hwpoisoned dirty swap cache page is kept in the swap cache and there's > >> simple interception code in do_swap_page() to catch it. But when trying > >> to swapoff, unuse_pte() will wrongly install a general sense of "future > >> accesses are invalid" swap entry for hwpoisoned swap cache page due to > >> unaware of such type of page. The user will receive SIGBUS signal without > >> expected BUS_MCEERR_AR payload. > > > > Have you observed this, or do you just think it's true? > > > >> +++ b/mm/swapfile.c > >> @@ -1767,7 +1767,8 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, > >> swp_entry_t swp_entry; > >> > >> dec_mm_counter(vma->vm_mm, MM_SWAPENTS); > >> - if (hwposioned) { > >> + /* Hwpoisoned swapcache page is also !PageUptodate. */ > >> + if (hwposioned || PageHWPoison(page)) { > > > > This line makes no sense to me. How do we get here with PageHWPoison() > > being true and hwposioned being false? > > hwposioned will be true iff ksm_might_need_to_copy returns -EHWPOISON. > And there's PageUptodate check in ksm_might_need_to_copy before we can return -EHWPOISON: > > ksm_might_need_to_copy > if (!PageUptodate(page)) > return page; /* let do_swap_page report the error */ > ^^^ > Will return here because hwpoisoned swapcache page is !PageUptodate(cleared via me_swapcache_dirty()). > > Or am I miss something? Ah! So we don't even get to calling copy_mc_to_kernel(). That seems like a bug in ksm_might_need_to_copy(), don't you think? Maybe this would be a better fix: + if (PageHWPoison(page)) + return ERR_PTR(-EHWPOISON); if (!PageUptodate(page)) return page;