On Thu, Feb 07, 2019 at 09:24:05AM -0800, Matthew Wilcox wrote: > On Thu, Feb 07, 2019 at 11:25:35AM -0500, Doug Ledford wrote: > > * Really though, as I said in my email to Tom Talpey, this entire > > situation is simply screaming that we are doing DAX networking wrong. > > We shouldn't be writing the networking code once in every single > > application that wants to do this. If we had a memory segment that we > > shared from server to client(s), and in that memory segment we > > implemented a clustered filesystem, then applications would simply mmap > > local files and be done with it. If the file needed to move, the kernel > > would update the mmap in the application, done. If you ask me, it is > > the attempt to do this the wrong way that is resulting in all this > > heartache. That said, for today, my recommendation would be to require > > ODP hardware for XFS filesystem with the DAX option, but allow ext2 > > filesystems to mount DAX filesystems on non-ODP hardware, and go in and > > modify the ext2 filesystem so that on DAX mounts, it disables hole punch > > and ftrunctate any time they would result in the forced removal of an > > established mmap. > > I agree that something's wrong, but I think the fundamental problem is > that there's no concept in RDMA of having an STag for storage rather > than for memory. > > Imagine if we could associate an STag with a file descriptor on the > server. The client could then perform an RDMA to that STag. On the > server, we'd need lots of smarts in the card and in the OS to know how > to treat that packet on arrival -- depending on what the file descriptor > referred to, it might only have to write into the page cache, or it > might set up an NVMe DMA, or it might resolve the underlying physical > address and DMA directly to an NV-DIMM. I think you just described ODP MRs. Jason