On Wed, Feb 6, 2019 at 3:21 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Wed, Feb 06, 2019 at 02:44:45PM -0800, Dan Williams wrote: > > > > Do they need to stick with xfs? > > > > Can you clarify the motivation for that question? This problem exists > > for any filesystem that implements an mmap that where the physical > > page backing the mapping is identical to the physical storage location > > for the file data. > > .. and needs to dynamicaly change that mapping. Which is not really > something inherent to the general idea of a filesystem. A file system > that had *strictly static* block assignments would work fine. > > Not all filesystem even implement hole punch. > > Not all filesystem implement reflink. > > ftruncate doesn't *have* to instantly return the free blocks to > allocation pool. > > ie this is not a DAX & RDMA issue but a XFS & RDMA issue. > > Replacing XFS is probably not be reasonable, but I wonder if a XFS-- > operating mode could exist that had enough features removed to be > safe? You're describing the current situation, i.e. Linux already implements this, it's called Device-DAX and some users of RDMA find it insufficient. The choices are to continue to tell them "no", or say "yes, but you need to submit to lease coordination". > Ie turn off REFLINK. Change the semantic of ftruncate to be more like > ETXTBUSY. Turn off hole punch. > > > > Are they really trying to do COW backed mappings for the RDMA > > > targets? Or do they want a COW backed FS but are perfectly happy > > > if the specific RDMA targets are *not* COW and are statically > > > allocated? > > > > I would expect the COW to be broken at registration time. Only ODP > > could possibly support reflink + RDMA. So I think this devolves the > > problem back to just the "what to do about truncate/punch-hole" > > problem in the specific case of non-ODP hardware combined with the > > Filesystem-DAX facility. > > Usually the problem with COW is that you make a READ RDMA MR and on a > COW'd file, and some other thread breaks the COW.. > > This probably becomes a problem if the same process that has the MR > triggers a COW break (ie by writing to the CPU mmap). This would cause > the page to be reassigned but the MR would not be updated, which is > not what the app expects. > > WRITE is simpler, once the COW is broken during GUP, the pages cannot > be COW'd again until the DMA pin is released. So new reflinks would be > blocked during the DMA pin period. > > To fix READ you'd have to treat it like WRITE and break the COW at GPU. Right, that's what I'm proposing that any longterm-GUP break COW as if it were a write.