On Tue, Sep 07, 2021 at 02:34:24PM -0700, Yang Shi wrote: > On Fri, Sep 3, 2021 at 5:03 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > > > On Fri, Sep 3, 2021 at 11:01 AM Yang Shi <shy828301@xxxxxxxxx> wrote: ... > > > > > > > > > > AFAIK the address_space error just works for fsync. Anyway I could be wrong. > > > > > > > > > > I think clearing the dirty flag might be the easiest way? It seems > > > > > unnecessary to notify the users when writing back since the most write > > > > > back happens asynchronously. They should be notified when the page is > > > > > accessed, e.g. read/write and page fault. > > > > > > > > > > I did some further investigation and got a clearer picture for > > > > > writeback filesystem: > > > > > 1. The page should be not written back: clearing dirty flag could > > > > > prevent from writeback > > > > > 2. The page should be not dropped (it shows as a clean page): the > > > > > refcount pin from hwpoison could prevent from invalidating (called by > > > > > cache drop, inode cache shrinking, etc), but it doesn't avoid > > > > > invalidation in DIO path (easy to deal with) > > > > > 3. The page should be able to get truncated/hole punched/unlinked: it > > > > > works as it is > > > > > 4. Notify users when the page is accessed, e.g. read/write, page fault > > > > > and other paths: this is hard > > > > > > > > > > The hardest part is #4. Since there are too many paths in filesystems > > > > > that do *NOT* check if page is poisoned or not, e.g. read/write, > > > > > compression (btrfs, f2fs), etc. A couple of ways to handle it off the > > > > > top of my head: > > > > > 1. Check hwpoison flag for every path, the most straightforward way, > > > > > but a lot work > > > > > 2. Return NULL for poisoned page from page cache lookup, the most > > > > > callsites check if NULL is returned, this should have least work I > > > > > think. But the error handling in filesystems just return -ENOMEM, the > > > > > error code will incur confusion to the users obviously. > > > > > 3. To improve #2, we could return error pointer, e.g. ERR_PTR(-EIO), > > > > > but this will involve significant amount of code change as well since > > > > > all the paths need check if the pointer is ERR or not. > > > > > > > > I think the approach #3 sounds good for now, it seems to me that these > > > > statements are about general ways to handle error pages on all page cache > > > > users, so then the amount of code changes is a big problem, but when > > > > focusing on shmem/tmpfs, could the amount of changes be more handlable, or > > > > still large? > > > > > > Yeah, I agree #3 makes more sense. Just return an error when finding > > > out corrupted page. I think this is the right semantic. > > I had a prototype patchset for #3, but now I have to consider stepping > back to rethink which way we should go. I actually did a patchset for > #1 too. By comparing the two patchsets and some deeper investigation > during preparing the two patchsets, I realized #3 may not be the best > way. > > For #3 ERR_PTR will be returned so all the callers need to check the > return value otherwise invalid pointer may be dereferenced, but not > all callers really care about the content of the page, for example, > partial truncate which just set the truncated range in one page to 0. > So for such paths it needs additional modification if ERR_PTR is > returned. And if the callers have their own way to handle the > problematic pages we need to add a new FGP flag to tell FGP functions > to return the pointer to the page. > > But we don't need to do anything for such paths if we go with #1. For > most paths (e.g. read/write) both ways need to check the validity of > the page by checking ERR PTR or page flag. But a lot of extra code > could be saved for the paths which don't care about the actual data > for #1. OK, so it's not clear which is better. Maybe we need discussion over patches... ... > > > > > > > > > > > > 5. We also could define a new FGP flag to return poisoned page, NULL > > > > > or error pointer. This also should need significant code change since > > > > > a lt callsites need to be contemplated. > > > > > > > > Could you explain a little more about which callers should use the flag? > > > > > > Just to solve the above invalidate/truncate problem and page fault > > > doesn't expect an error pointer. But it seems the above > > > invalidate/truncate paths don't matter. Page fault should be the only > > > user since page fault may need unlock the page if poisoned page is > > > returned. Orignally PG_hwpoison is detected in page fault handler for file-backed pages like below, static vm_fault_t __do_fault(struct vm_fault *vmf) ... ret = vma->vm_ops->fault(vmf); if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY | VM_FAULT_DONE_COW))) return ret; if (unlikely(PageHWPoison(vmf->page))) { if (ret & VM_FAULT_LOCKED) unlock_page(vmf->page); put_page(vmf->page); vmf->page = NULL; return VM_FAULT_HWPOISON; } so it seems to me that if a filesystem switches to the new scheme of keeping error pages in page cache, ->fault() callback for the filesystem needs to be aware of hwpoisoned pages inside it and returns VM_FAULT_HWPOISON when it detects an error page (maybe by detecting pagecache_get_page() to return PTR_ERR(-EIO or -EHWPOISON) or some other ways). In the other filesystems with the old scheme, fault() callback does not return VM_FAULT_HWPOISON so that page fault handler falls back to the generic PageHWPoison check above. > > > > It seems page fault check IS_ERR(page) then just return > > VM_FAULT_HWPOISON. But I found a couple of places in shmem which want > > to return head page then handle subpage or just return the page but > > don't care the content of the page. They should ignore hwpoison. So I > > guess we'd better to have a FGP flag for such cases. At least currently thp are supposed to be split before error handling. We could loosen this assumption to support hwpoison on a subpage of thp, but I'm not sure whether that has some benefit. And also this discussion makes me aware that we need to have a way to prevent hwpoisoned pages from being used to thp collapse operation. Thanks, Naoya Horiguchi