On Tue, Sep 7, 2021 at 7:50 PM HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@xxxxxxx> wrote: > > On Tue, Sep 07, 2021 at 02:34:24PM -0700, Yang Shi wrote: > > On Fri, Sep 3, 2021 at 5:03 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > > > > > On Fri, Sep 3, 2021 at 11:01 AM Yang Shi <shy828301@xxxxxxxxx> wrote: > ... > > > > > > > > > > > > AFAIK the address_space error just works for fsync. Anyway I could be wrong. > > > > > > > > > > > > I think clearing the dirty flag might be the easiest way? It seems > > > > > > unnecessary to notify the users when writing back since the most write > > > > > > back happens asynchronously. They should be notified when the page is > > > > > > accessed, e.g. read/write and page fault. > > > > > > > > > > > > I did some further investigation and got a clearer picture for > > > > > > writeback filesystem: > > > > > > 1. The page should be not written back: clearing dirty flag could > > > > > > prevent from writeback > > > > > > 2. The page should be not dropped (it shows as a clean page): the > > > > > > refcount pin from hwpoison could prevent from invalidating (called by > > > > > > cache drop, inode cache shrinking, etc), but it doesn't avoid > > > > > > invalidation in DIO path (easy to deal with) > > > > > > 3. The page should be able to get truncated/hole punched/unlinked: it > > > > > > works as it is > > > > > > 4. Notify users when the page is accessed, e.g. read/write, page fault > > > > > > and other paths: this is hard > > > > > > > > > > > > The hardest part is #4. Since there are too many paths in filesystems > > > > > > that do *NOT* check if page is poisoned or not, e.g. read/write, > > > > > > compression (btrfs, f2fs), etc. A couple of ways to handle it off the > > > > > > top of my head: > > > > > > 1. Check hwpoison flag for every path, the most straightforward way, > > > > > > but a lot work > > > > > > 2. Return NULL for poisoned page from page cache lookup, the most > > > > > > callsites check if NULL is returned, this should have least work I > > > > > > think. But the error handling in filesystems just return -ENOMEM, the > > > > > > error code will incur confusion to the users obviously. > > > > > > 3. To improve #2, we could return error pointer, e.g. ERR_PTR(-EIO), > > > > > > but this will involve significant amount of code change as well since > > > > > > all the paths need check if the pointer is ERR or not. > > > > > > > > > > I think the approach #3 sounds good for now, it seems to me that these > > > > > statements are about general ways to handle error pages on all page cache > > > > > users, so then the amount of code changes is a big problem, but when > > > > > focusing on shmem/tmpfs, could the amount of changes be more handlable, or > > > > > still large? > > > > > > > > Yeah, I agree #3 makes more sense. Just return an error when finding > > > > out corrupted page. I think this is the right semantic. > > > > I had a prototype patchset for #3, but now I have to consider stepping > > back to rethink which way we should go. I actually did a patchset for > > #1 too. By comparing the two patchsets and some deeper investigation > > during preparing the two patchsets, I realized #3 may not be the best > > way. > > > > For #3 ERR_PTR will be returned so all the callers need to check the > > return value otherwise invalid pointer may be dereferenced, but not > > all callers really care about the content of the page, for example, > > partial truncate which just set the truncated range in one page to 0. > > So for such paths it needs additional modification if ERR_PTR is > > returned. And if the callers have their own way to handle the > > problematic pages we need to add a new FGP flag to tell FGP functions > > to return the pointer to the page. > > > > But we don't need to do anything for such paths if we go with #1. For > > most paths (e.g. read/write) both ways need to check the validity of > > the page by checking ERR PTR or page flag. But a lot of extra code > > could be saved for the paths which don't care about the actual data > > for #1. > > OK, so it's not clear which is better. Maybe we need discussion over > patches... Yeah, the conclusion is #1 would need fewer changes than #3. > > ... > > > > > > > > > > > > > > > > 5. We also could define a new FGP flag to return poisoned page, NULL > > > > > > or error pointer. This also should need significant code change since > > > > > > a lt callsites need to be contemplated. > > > > > > > > > > Could you explain a little more about which callers should use the flag? > > > > > > > > Just to solve the above invalidate/truncate problem and page fault > > > > doesn't expect an error pointer. But it seems the above > > > > invalidate/truncate paths don't matter. Page fault should be the only > > > > user since page fault may need unlock the page if poisoned page is > > > > returned. > > Orignally PG_hwpoison is detected in page fault handler for file-backed pages > like below, > > static vm_fault_t __do_fault(struct vm_fault *vmf) > ... > ret = vma->vm_ops->fault(vmf); > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY | > VM_FAULT_DONE_COW))) > return ret; > > if (unlikely(PageHWPoison(vmf->page))) { > if (ret & VM_FAULT_LOCKED) > unlock_page(vmf->page); > put_page(vmf->page); > vmf->page = NULL; > return VM_FAULT_HWPOISON; > } > > so it seems to me that if a filesystem switches to the new scheme of keeping > error pages in page cache, ->fault() callback for the filesystem needs to be > aware of hwpoisoned pages inside it and returns VM_FAULT_HWPOISON when it > detects an error page (maybe by detecting pagecache_get_page() to return > PTR_ERR(-EIO or -EHWPOISON) or some other ways). In the other filesystems > with the old scheme, fault() callback does not return VM_FAULT_HWPOISON so > that page fault handler falls back to the generic PageHWPoison check above. Yes, exactly. If we return ERR_PTR we need modify the above code accordingly. But returning the page, no change is required. > > > > > > > It seems page fault check IS_ERR(page) then just return > > > VM_FAULT_HWPOISON. But I found a couple of places in shmem which want > > > to return head page then handle subpage or just return the page but > > > don't care the content of the page. They should ignore hwpoison. So I > > > guess we'd better to have a FGP flag for such cases. > > At least currently thp are supposed to be split before error handling. Not for file THP. > We could loosen this assumption to support hwpoison on a subpage of thp, > but I'm not sure whether that has some benefit. I don't quite get your point. Currently the hwpoison flag is set on specific subpage instead of head page. > And also this discussion makes me aware that we need to have a way to > prevent hwpoisoned pages from being used to thp collapse operation. Actually not. The refcount from hwpoison could prevent it from collapsing. But if we return ERR_PTR (#3) we need extra code in khugepaged to handle it. So I realized #1 would require fewer changes. And leaving the page state check to callers seem reasonable since the callers usually check other states, e.g. uptodate. > > Thanks, > Naoya Horiguchi