On Tue, Sep 21, 2021 at 2:49 AM Naoya Horiguchi <naoya.horiguchi@xxxxxxxxx> wrote: > > On Tue, Sep 14, 2021 at 11:37:17AM -0700, Yang Shi wrote: > > The current behavior of memory failure is to truncate the page cache > > regardless of dirty or clean. If the page is dirty the later access > > will get the obsolete data from disk without any notification to the > > users. This may cause silent data loss. It is even worse for shmem > > since shmem is in-memory filesystem, truncating page cache means > > discarding data blocks. The later read would return all zero. > > > > The right approach is to keep the corrupted page in page cache, any > > later access would return error for syscalls or SIGBUS for page fault, > > until the file is truncated, hole punched or removed. The regular > > storage backed filesystems would be more complicated so this patch > > is focused on shmem. This also unblock the support for soft > > offlining shmem THP. > > > > Signed-off-by: Yang Shi <shy828301@xxxxxxxxx> > > --- > > mm/memory-failure.c | 3 ++- > > mm/shmem.c | 25 +++++++++++++++++++++++-- > > 2 files changed, 25 insertions(+), 3 deletions(-) > > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index 54879c339024..3e06cb9d5121 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -1101,7 +1101,8 @@ static int page_action(struct page_state *ps, struct page *p, > > result = ps->action(p, pfn); > > > > count = page_count(p) - 1; > > - if (ps->action == me_swapcache_dirty && result == MF_DELAYED) > > + if ((ps->action == me_swapcache_dirty && result == MF_DELAYED) || > > + (ps->action == me_pagecache_dirty && result == MF_FAILED)) > > This new line seems to affect the cases of dirty page cache > on other filesystems, whose result is to miss "still referenced" > messages for some unmap failure cases (although it's not so critical). > So checking filesystem type (for example with shmem_mapping()) > might be helpful? > > And I think that if we might want to have some refactoring to pass > *ps to each ps->action() callback, then move this refcount check to > the needed places. > I don't think that we always need the refcount check, for example in > MF_MSG_KERNEL and MF_MSG_UNKNOWN cases (because no one knows the > expected values for these cases). Yeah, seems make sense to me. How's about doing the below (totally untested): static inline bool check_refcount(struct *page, bool dec) { int count = page_count(page) - 1; if (dec || shmem_mapping(page->mapping)) count -= 1; if (count > 0) { pr_err("Memory failure: %#lx: %s still referenced by %d users\n", pfn, action_page_types[ps->type], count); return false; } return true; } Then call this in the needed me_* functions and return right value per the return value of it. I think me_swapcache_dirty() is the only place need pass in true for dec parameter. > > > > count--; > > if (count > 0) { > > pr_err("Memory failure: %#lx: %s still referenced by %d users\n", > > diff --git a/mm/shmem.c b/mm/shmem.c > > index 88742953532c..ec33f4f7173d 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -2456,6 +2456,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping, > > struct inode *inode = mapping->host; > > struct shmem_inode_info *info = SHMEM_I(inode); > > pgoff_t index = pos >> PAGE_SHIFT; > > + int ret = 0; > > > > /* i_rwsem is held by caller */ > > if (unlikely(info->seals & (F_SEAL_GROW | > > @@ -2466,7 +2467,19 @@ shmem_write_begin(struct file *file, struct address_space *mapping, > > return -EPERM; > > } > > > > - return shmem_getpage(inode, index, pagep, SGP_WRITE); > > + ret = shmem_getpage(inode, index, pagep, SGP_WRITE); > > + > > + if (!ret) { > > Maybe this "!ret" check is not necessary because *pagep is set > non-NULL only when ret is 0. It could save one indent level. Yes, sure. > > > + if (*pagep) { > > + if (PageHWPoison(*pagep)) { > > + unlock_page(*pagep); > > + put_page(*pagep); > > + ret = -EIO; > > + } > > + } > > + } > > + > > + return ret; > > } > > > > static int > > @@ -2555,6 +2568,11 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > > unlock_page(page); > > } > > > > + if (page && PageHWPoison(page)) { > > + error = -EIO; > > + break; > > + } > > + > > /* > > * We must evaluate after, since reads (unlike writes) > > * are called without i_rwsem protection against truncate > > @@ -3782,7 +3800,6 @@ const struct address_space_operations shmem_aops = { > > #ifdef CONFIG_MIGRATION > > .migratepage = migrate_page, > > #endif > > - .error_remove_page = generic_error_remove_page, > > This change makes truncate_error_page() calls invalidate_inode_page(), > and in my testing it fails with "Failed to invalidate" message. > So as a result memory_failure() finally returns with -EBUSY. I'm not > sure it's expected because this patchset changes to keep error pages > in page cache as a proper error handling. > Maybe you can avoid this by defining .error_remove_page in shmem_aops > which simply returns 0. Yes, the "Failed to invalidate" message seems confusing. I agree a shmem specific callback is better. > > > }; > > EXPORT_SYMBOL(shmem_aops); > > > > @@ -4193,6 +4210,10 @@ struct page *shmem_read_mapping_page_gfp(struct address_space *mapping, > > page = ERR_PTR(error); > > else > > unlock_page(page); > > + > > + if (PageHWPoison(page)) > > + page = NULL; > > + > > return page; > > One more comment ... > > - I guess that you add PageHWPoison() checks after some call sites > of shmem_getpage() and shmem_getpage_gfp(), but seems not cover all. > For example, mcontinue_atomic_pte() in mm/userfaultfd.c can properly > handle PageHWPoison? No, I didn't touch anything outside shmem.c. I could add this in the next version. BTW, I just found another problem for the change in shmem_read_mapping_page_gfp(), it should return ERR_PTR(-EIO) instead of NULL since the callers may not handle NULL. Will fix in the next version too. > > I'm trying to test more detail, but in my current understanding, > this patch looks promising to me. Thank you for your effort. Thank a lot for taking time do the review. > > Thanks, > Naoya Horiguchi