On Thu, Feb 07, 2019 at 11:25:35AM -0500, Doug Ledford wrote: > * Really though, as I said in my email to Tom Talpey, this entire > situation is simply screaming that we are doing DAX networking wrong. > We shouldn't be writing the networking code once in every single > application that wants to do this. If we had a memory segment that we > shared from server to client(s), and in that memory segment we > implemented a clustered filesystem, then applications would simply mmap > local files and be done with it. If the file needed to move, the kernel > would update the mmap in the application, done. If you ask me, it is > the attempt to do this the wrong way that is resulting in all this > heartache. That said, for today, my recommendation would be to require > ODP hardware for XFS filesystem with the DAX option, but allow ext2 > filesystems to mount DAX filesystems on non-ODP hardware, and go in and > modify the ext2 filesystem so that on DAX mounts, it disables hole punch > and ftrunctate any time they would result in the forced removal of an > established mmap. I agree that something's wrong, but I think the fundamental problem is that there's no concept in RDMA of having an STag for storage rather than for memory. Imagine if we could associate an STag with a file descriptor on the server. The client could then perform an RDMA to that STag. On the server, we'd need lots of smarts in the card and in the OS to know how to treat that packet on arrival -- depending on what the file descriptor referred to, it might only have to write into the page cache, or it might set up an NVMe DMA, or it might resolve the underlying physical address and DMA directly to an NV-DIMM.