On Fri, Oct 15, 2021 at 1:28 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, 14 Oct 2021 12:16:09 -0700 Yang Shi <shy828301@xxxxxxxxx> wrote: > > > When discussing the patch that splits page cache THP in order to offline the > > poisoned page, Noaya mentioned there is a bigger problem [1] that prevents this > > from working since the page cache page will be truncated if uncorrectable > > errors happen. By looking this deeper it turns out this approach (truncating > > poisoned page) may incur silent data loss for all non-readonly filesystems if > > the page is dirty. It may be worse for in-memory filesystem, e.g. shmem/tmpfs > > since the data blocks are actually gone. > > > > To solve this problem we could keep the poisoned dirty page in page cache then > > notify the users on any later access, e.g. page fault, read/write, etc. The > > clean page could be truncated as is since they can be reread from disk later on. > > > > The consequence is the filesystems may find poisoned page and manipulate it as > > healthy page since all the filesystems actually don't check if the page is > > poisoned or not in all the relevant paths except page fault. In general, we > > need make the filesystems be aware of poisoned page before we could keep the > > poisoned page in page cache in order to solve the data loss problem. > > Is the "RFC" still accurate, or might it be an accidental leftover? Yeah, I think it can be removed. > > I grabbed this series as-is for some testing, but I do think it wouild > be better if it was delivered as two separate series - one series for > the -stable material and one series for the 5.16-rc1 material. Yeah, the patch 1/6 and patch 2/6 should go to -stable, then the remaining patches are for 5.16-rc1. Thanks for taking them. >