On Wed, Mar 13, 2019 at 09:11:13AM +1100, Dave Chinner wrote: > On Tue, Mar 12, 2019 at 03:39:33AM -0700, Ira Weiny wrote: > > IMHO I don't think that the copy_file_range() is going to carry us through the > > next wave of user performance requirements. RDMA, while the first, is not the > > only technology which is looking to have direct access to files. XDP is > > another.[1] > > Sure, all I doing here was demonstrating that people have been > trying to get local direct access to file mappings to DMA directly > into them for a long time. Direct Io games like these are now > largely unnecessary because we now have much better APIs to do > zero-copy data transfer between files (which can do hardware offload > if it is available!). > > It's the long term pins that RDMA does that are the problem here. > I'm asssuming that for XDP, you're talking about userspace zero copy > from files to the network hardware and vice versa? transmit is > simple (read-only mapping), but receive probably requires bpf > programs to ensure that data (minus headers) in the incoming packet > stream is correctly placed into the UMEM region? Yes, exactly. > > XDP receive seems pretty much like the same problem as RDMA writes > into the file. i.e. the incoming write DMAs are going to have to > trigger page faults if the UMEM is a long term pin so the filesystem > behaves correctly with this remote data placement. I'd suggest that > RDMA, XDP and anything other hardware that is going to pin > file-backed mappings for the long term need to use the same "inform > the fs of a write operation into it's mapping" mechanisms... Yes agreed. I have a hack patch I'm testing right now which allows the user to take a LAYOUT lease from user space and GUP triggers on that, either allowing or rejecting the pin based on the lease. I think this is the first step of what Jan suggested.[1] There is a lot more detail to work out with what happens if that lease needs to be broken. > > And if we start talking about wanting to do peer-to-peer DMA from > network/GPU device to storage device without going through a > file-backed CPU mapping, we still need to have the filesystem > involved to translate file offsets to storage locations the > filesystem has allocated for the data and to lock them down for as > long as the peer-to-peer DMA offload is in place. In effect, this > is the same problem as RDMA+FS-DAXs - the filesystem owns the file > offset to storage location mapping and manages storage access > arbitration, not the mm/vma mapping presented to userspace.... I've only daydreamed about Peer-to-peer transfers. But yes I think this is the direction we need to go. But The details of doing a GPU -> RDMA -> {network } -> RDMA -> FS DAX And back again... without CPU/OS involvement are only a twinkle in my eye... If that. Ira [1] https://lore.kernel.org/lkml/20190212160707.GA19076@xxxxxxxxxxxxxx/