On Wed, Jun 21, 2017 at 09:53:46AM +1000, Dave Chinner wrote: > On Tue, Jun 20, 2017 at 09:17:36AM -0700, Dan Williams wrote: > > On Tue, Jun 20, 2017 at 1:49 AM, Christoph Hellwig <hch@xxxxxx> wrote: > > > [stripped giant fullquotes] > > > > > > On Mon, Jun 19, 2017 at 10:53:12PM -0700, Andy Lutomirski wrote: > > >> But that's my whole point. The kernel doesn't really need to prevent > > >> all these background maintenance operations -- it just needs to block > > >> .page_mkwrite until they are synced. I think that whatever new > > >> mechanism we add for this should be sticky, but I see no reason why > > >> the filesystem should have to block reflink on a DAX file entirely. > > > > > > Agreed - IFF we want to support write through semantics this is the > > > only somewhat feasible way. It still has massive downsides of forcing > > > the full sync machinery to run from the page fauly handler, which > > > I'm rather scared off, but that's still better than creating a magic > > > special case that isn't managable at all. > > > > An immutable-extent DAX-file and a reflink-capable DAX-file are not > > mutually exclusive, > > Actually, they are mutually exclusive: when the immutable extent DAX > inode is breaking the extent sharing done during the reflink > operation, the copy-on-write operation requires allocating and > freeing extents on the inode that has immutable extents. Which, if > the inode really has immutable extents, cannot be done. > > That said, if the extent sharing is broken on the other side of the > reflink (i.e. the non-immutable inode created by the reflink) then > the extent map of the inode with immutable extents will remain > unchanged. i.e. there are two sides to this, and if you only see one > side you might come to the wrong conclusion. > > However, we cannot guarantee that no writes occur to the inode with > immutable extent maps (especially as the whole point is to allow > userspace writes and commits without the kernel being involved), so > extent sharing on immutable extent maps cannot be allowed... Just to play devil's advocate... /If/ you have rmap and /if/ you discover that there's only one IOMAP_IMMUTABLE file owning this same block and /if/ you're willing to relocate every other mapping on the whole filesystem, /then/ you could /in theory/ support shared daxfiles. However, that's so many on-disk metadata lookups to shove into a pagefault handler that I don't think anyone in XFSland would entertain such an ugly fantasy. You'd be making a lot of metadata requests, and you'd have to lock the rmapbt while grabbing inodes, which is insane. Much easier to have a per-inode flag that says "the block map of this file does not change" and put up with the restricted semantics. --D > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html