On Fri, 15 Feb 2019, Ira Weiny wrote: > > > > for filesystems and processes. The only problems come in for the things > > > > which bypass the page cache like O_DIRECT and DAX. > > > > > > It makes a lot of sense since the filesystems play COW etc games with the > > > pages and RDMA is very much like O_DIRECT in that the pages are modified > > > directly under I/O. It also bypasses the page cache in case you have > > > not noticed yet. > > > > It is quite different, O_DIRECT modifies the physical blocks on the > > storage, bypassing the memory copy. > > > > Really? I thought O_DIRECT allowed the block drivers to write to/from user > space buffers. But the _storage_ was still under the control of the block > drivers? It depends on what you see as the modification target. O_DIRECT uses memory as a target and source like RDMA. The block device is at the other end of the handling. > > RDMA modifies the memory copy. > > > > pages are necessary to do RDMA, and those pages have to be flushed to > > disk.. So I'm not seeing how it can be disconnected from the page > > cache? > > I don't disagree with this. RDMA does direct access to memory. If that memmory is a mmmap of a regular block device then we have a problem (this has not been a standard use case to my knowledge). The semantics are simmply different. RDMA expects memory to be pinned and always to be able to read and write from it. The block device/filesystem expects memory access to be controllable via the page permission. In particular access to be page need to be able to be stopped. This is fundamentally incompatible. RDMA access to such an mmapped section must preserve the RDMA semantics while the pinning is done and can only provide the access control after RDMA is finished. Pages in the RDMA range cannot be handled like normal page cache pages. This is in particular evident in the DAX case in which we have direct pass through even to the storage medium. And in this case write through can replace the page cache.