On Thu, Oct 14, 2021 at 12:16:14PM -0700, Yang Shi wrote: > The current behavior of memory failure is to truncate the page cache > regardless of dirty or clean. If the page is dirty the later access > will get the obsolete data from disk without any notification to the > users. This may cause silent data loss. It is even worse for shmem > since shmem is in-memory filesystem, truncating page cache means > discarding data blocks. The later read would return all zero. > > The right approach is to keep the corrupted page in page cache, any > later access would return error for syscalls or SIGBUS for page fault, > until the file is truncated, hole punched or removed. The regular > storage backed filesystems would be more complicated so this patch > is focused on shmem. This also unblock the support for soft > offlining shmem THP. > > Signed-off-by: Yang Shi <shy828301@xxxxxxxxx> > --- > mm/memory-failure.c | 10 +++++++++- > mm/shmem.c | 37 ++++++++++++++++++++++++++++++++++--- > mm/userfaultfd.c | 5 +++++ > 3 files changed, 48 insertions(+), 4 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index cdf8ccd0865f..f5eab593b2a7 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -57,6 +57,7 @@ > #include <linux/ratelimit.h> > #include <linux/page-isolation.h> > #include <linux/pagewalk.h> > +#include <linux/shmem_fs.h> > #include "internal.h" > #include "ras/ras_event.h" > > @@ -866,6 +867,7 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p) > { > int ret; > struct address_space *mapping; > + bool extra_pins; > > delete_from_lru_cache(p); > > @@ -894,6 +896,12 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p) > goto out; > } > > + /* > + * The shmem page is kept in page cache instead of truncating > + * so is expected to have an extra refcount after error-handling. > + */ > + extra_pins = shmem_mapping(mapping); > + > /* > * Truncation is a bit tricky. Enable it per file system for now. > * > @@ -903,7 +911,7 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p) > out: > unlock_page(p); > > - if (has_extra_refcount(ps, p, false)) > + if (has_extra_refcount(ps, p, extra_pins)) > ret = MF_FAILED; > > return ret; > diff --git a/mm/shmem.c b/mm/shmem.c > index b5860f4a2738..69eaf65409e6 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -2456,6 +2456,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping, > struct inode *inode = mapping->host; > struct shmem_inode_info *info = SHMEM_I(inode); > pgoff_t index = pos >> PAGE_SHIFT; > + int ret = 0; > > /* i_rwsem is held by caller */ > if (unlikely(info->seals & (F_SEAL_GROW | > @@ -2466,7 +2467,15 @@ shmem_write_begin(struct file *file, struct address_space *mapping, > return -EPERM; > } > > - return shmem_getpage(inode, index, pagep, SGP_WRITE); > + ret = shmem_getpage(inode, index, pagep, SGP_WRITE); > + > + if (*pagep && PageHWPoison(*pagep)) { shmem_getpage() could return with pagep == NULL, so you need check ret first to avoid NULL pointer dereference. > + unlock_page(*pagep); > + put_page(*pagep); > + ret = -EIO; > + } > + > + return ret; > } > > static int > @@ -2555,6 +2564,11 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > unlock_page(page); > } > > + if (page && PageHWPoison(page)) { > + error = -EIO; Is it cleaner to add PageHWPoison() check in the existing "if (page)" block just above? Then, you don't have to check "page != NULL" twice. @@ -2562,7 +2562,11 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (sgp == SGP_CACHE) set_page_dirty(page); unlock_page(page); + if (PageHWPoison(page)) { + error = -EIO; + break; + } } /* Thanks, Naoya Horiguchi