Alistair Popple wrote: > > Dave Chinner <david@xxxxxxxxxxxxx> writes: > > > On Fri, Mar 22, 2024 at 11:01:25AM +1100, Alistair Popple wrote: > >> > >> Dan Williams <dan.j.williams@xxxxxxxxx> writes: > >> > >> > Alistair Popple wrote: > >> >> > >> >> Alistair Popple <apopple@xxxxxxxxxx> writes: > >> >> > >> >> > Dan Williams <dan.j.williams@xxxxxxxxx> writes: > >> >> > > >> >> >> Alistair Popple wrote: > >> >> > > >> >> > I also noticed folio_anon() is not safe to call on a FS DAX page due to > >> >> > sharing PAGE_MAPPING_DAX_SHARED. > >> >> > >> >> Also it feels like I could be missing something here. AFAICT the > >> >> page->mapping and page->index fields can't actually be used outside of > >> >> fs/dax because they are overloaded for the shared case. Therefore > >> >> setting/clearing them could be skipped and the only reason for doing so > >> >> is so dax_associate_entry()/dax_disassociate_entry() can generate > >> >> warnings which should never occur anyway. So all that code is > >> >> functionally unnecessary. > >> > > >> > What do you mean outside of fs/dax, do you literally mean outside of > >> > fs/dax.c, or the devdax case (i.e. dax without fs-entanglements)? > >> > >> Only the cases fs dax pages might need it. ie. Not devdax which I > >> haven't looked at closely yet. > >> > >> > Memory > >> > failure needs ->mapping and ->index to rmap dax pages. See > >> > mm/memory-failure.c::__add_to_kill() and > >> > mm/memory-failure.c::__add_to_kill_fsdax() where that latter one is for > >> > cases where the fs needs has signed up to react to dax page failure. > >> > >> How does that work for reflink/shared pages which overwrite > >> page->mapping and page->index? > > > > Via reverse mapping in the *filesystem*, not the mm rmap stuff. > > > > pmem_pagemap_memory_failure() > > dax_holder_notify_failure() > > .notify_failure() > > xfs_dax_notify_failure() > > xfs_dax_notify_ddev_failure() > > xfs_rmap_query_range(xfs_dax_failure_fn) > > xfs_dax_failure_fn(rmap record) > > <grabs inode from cache> > > <converts range to file offset> > > mf_dax_kill_procs(inode->mapping, pgoff) > > collect_procs_fsdax(mapping, page) > > add_to_kill_fsdax(task) > > __add_to_kill(task) > > unmap_and_kill_tasks() > > > > Remember: in FSDAX, the pages are the storage media physically owned > > by the filesystem, not the mm subsystem. Hence answering questions > > like "who owns this page" can only be answered correctly by asking > > the filesystem. > > Thanks Dave for writing that up, it really helped solidify my > understanding of how this is all supposed to work. Yes, thanks Dave! > > We shortcut that for pages that only have one owner - we just store > > the owner information in the page as a {mapping, offset} tuple. But > > when we have multiple owners, the only way to find all the {mapping, > > offset} tuples is to ask the filesystem to find all the owners of > > that page. > > > > Hence the special case values for page->mapping/page->index for > > pages over shared filesystem extents. These shared extents are > > communicated to the fsdax layer via the IOMAP_F_SHARED flag > > in the iomaps returned by the filesystem. This flag is the trigger > > for the special mapping share count behaviour to be used. e.g. see > > dax_insert_entry(iomap_iter) -> dax_associate_entry(shared) -> > > dax_page_share_get().... > > > >> Eg. in __add_to_kill() if *p is a shared fs > >> dax page then p->mapping == PAGE_MAPPING_DAX_SHARED and > >> page_address_in_vma(vma, p) will probably crash. > > > > As per above, we don't get the mapping from the page in those cases. > > Yep, that all makes sense and I see where I was getting confsued. It was > because in __add_to_kill() we do actually use page->mapping when > page_address_in_vma(vma, p) is called. And because > folio_test_anon(page_folio(p)) is also true for shared FS DAX pages > (p->mapping == PAGE_MAPPING_DAX_SHARED/PAGE_MAPPING_DAX_SHARED) I > thought things would go bad there. > > However after re-reading the page_address_in_vma() implementation I > noticed that isn't the case, because folio_anon_vma(page_folio(p)) will > still return NULL which was the detail I had missed. > > So to go back to my original point it seems page->mapping and > page->index is largely unused on fs dax pages, even for memory > failure. Because for memory failure the mapping and fsdax_pgoff comes > from the filesystem not the page. Only for XFS, all other filesystems depend on typical rmap, and I believe XFS only uses that path if the fs is mounted with reflink support. You had asked about tests before and I do have a regression test for memory-failure in the ndctl regression suite, that should be validating this for xfs, and ext4: https://github.com/pmem/ndctl/blob/main/test/dax-poison.c#L47 meson test -C build --suite ndctl:ndctl There are setup instructions here: https://github.com/pmem/ndctl/blob/main/README.md I typically run it in its own VM. This run_qemu script can build and boot an enironment to run it: https://github.com/pmem/run_qemu There is also kdevops that includes it, but I have not tried that: https://github.com/linux-kdevops/kdevops