On Wed, Oct 09, 2019 at 10:11:52AM -0700, Darrick J. Wong wrote: > On Tue, Oct 08, 2019 at 11:31:44PM -0700, Christoph Hellwig wrote: > > Btw, I just had a chat with Dan last week on this. And he pointed out > > that while this series deals with the read/write path issues of > > reflink on DAX it doesn't deal with the mmap side issue that > > page->mapping and page->index can point back to exactly one file. > > > > I think we want a few xfstests that reflink a file and then use the > > different links using mmap, as that should blow up pretty reliably. > > Hmm, you're right, we don't actually have a test that checks the > behavior of mwriting all copies of a shared block. Ok, I'll go write > one. I've pointed this problem out to everyone who has asked me "what do we need to do to support reflink on DAX". I've even walked a couple of people right through the problem that needs to be solved and discussed the potential solutions to it. Problems that I think need addressing: - device dax and filesystem dax have fundamentally different needs in this space, so they need to be separated and not try to use the same solution. - dax_lock_entry() being used as a substitute for page_lock() but it not being held on the page itself means it can't be extended to serialise access to the page across multiple mappings that are unaware of each other - dax_lock_page/dax_unlock_page interface for hardware memory errors needs to report to the filesystem for processing and repair, not assume the page is user data and killing processes is the only possible recovery mechanism. - dax_associate_entry/dax_disassociate_entry can only work for a 1:1 page:mapping,index relationship. It needs to go away and be replaced by a mechanism that allows tracking multiple page mapping/index/state tuples. This has much wider use than DAX (e.g. sharing page cache pages between reflinked files) I've proposed shadow pages (based on a concept from Matethw Wilcox) for each read-only reflink mapping with the real physical page being owned by the filesystem and indexed by LBA in the filesystem buffer cache. This would be based on whether the extent in the file the page is mapped from has multiple references to it. i.e. When a new page mapping occurs in a shared extent, we add the page to the buffer cache (i.e. point a struct xfs_buf at it)i if it isn't already present, then allocate a shadow page, point it at the master, set it up with the new mapping,index tuple and add it to the mapping tree. Then we can treat it as a unique page even though it points to the read-only master page. When the page get's COWed, we toss away the shadow page and the master can be reclaimed with the reference count goes to zero or the extent is no longer shared. Think of it kind of like the way we multiply reference the zero page for holes in mmap()d dax regions, except we can have millions of them and they are found by physical buffer cache index lookups. This works for both DAX and non-DAX sharing of read-only shared filesytsem pages. i.e. it would form the basis of single-copy read-only page cache pages for reflinked files. There was quite a bit of talk at LSFMM 2018 about having a linked list of mapping structures hanging off a struct page, one for each mapping that references the page. Operations would then have to walk all mappings that reference the page. This was useful to other subsystems (HMM?) for some purpose I forget, but I'm not sure it's particularly useful by itself for non-dax reflink purposes - I suspect the filesystem would still need to track such pages itself in it's buffer cache so it can find the cached page to link new reflink copies to the same page... ISTR a couple of other solutions were thrown around, but I don't think anyone came up with a simple solution... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx