Thank you again Christoph for your earlier answer. Quick Question: would I be correct in assuming that if I mmap()ed each 8-64MB (chunk) file from XFS, and then did RDMA from the mmap region, that it would first be copied from NVMe into DRAM (does this bypass CPU?) and *then* be copied across RDMA, rather than directly be copied from NVMe by RDMA? Or does O_DIRECT properly allow bypass straight to NVMe for RDMA? For what this #1 entry are doing though, each of the 512 nodes have their own separate XFS FS as well as their own separate RocksDB, both backed by NVMe. They are doing filesystem ops almost entirely in user-mode (no kernel, no FUSE) by intercepting application binaries and rewriting syscall instructions into jumps into their user-mode library code and doing message passing to RDMA transfers to/from application memory from/to remote node’s NVMe. I don’t believe they’ve modified XFS, nor using pNFS. I don’t know if there’s any other mechanism though other than mmap() and then RDMA on that region? - Dan > On 20 Oct 2021, at 17:35, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > On Wed, Oct 20, 2021 at 09:33:43AM -0700, Christoph Hellwig wrote: >> On Wed, Oct 20, 2021 at 12:51:05PM +0100, Dan Greenfield wrote: >>> Do you have any ideas how they could have been able to utilise RDMA so that node A can directly access data chunks stored on XFS on node B? Is the only approach to mmap the chunk on node B and then RDMA it to/from node A? >> >> I'm not going to watch a video, but with the pNFS code other nodes can >> access data on an XFS node directly using any SCSI transport. >> For RMDA that would be SRP or iSCSI/iSER. >> >> Note that I also have an unfinished draft to support NVMe, which has >> an RDMA transports as well and someone else could trivially reimplement >> that as well. > > Oh, and just FYI here are my slides on the pNFS support: > > https://events.static.linuxfound.org/sites/events/files/slides/pnfs.pdf