> On Jul 6, 2024, at 2:42 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > On Sat, Jul 06, 2024 at 04:37:22PM +1000, NeilBrown wrote: >>> a different scheme for bypassing the server for I/O. Maybe there is >>> a really good killer argument for doing that, but it needs to be clearly >>> stated and defended instead of assumed. >> >> Could you provide a reference to the text book - or RFC - that describes >> a pNFS DS protocol that completely bypasses the network, allowing the >> client and MDS to determine if they are the same host and to potentially >> do zero-copy IO. > > I did not say that we have the exact same functionality available and > there is no work to do at all, just that it is the standard way to bypass > the server. > > RFC 5662, RFC 5663 and RFC 8154 specify layouts that completely bypass > the network and require the client and server to find out that they talk > to the same storage devuce, and directly perform zero copy I/O. > They do not require to be on the same host, though. > >> If not, I will find it hard to understand your claim that it is "the >> text book example". > > pNFS is all about handing out grants to bypass the server for I/O. > That is exactly what localio is doing. In particular, Neil, a pNFS block/SCSI layout provides the client with a set of device IDs. If the client is on the same storage fabric as those devices, it can then access those devices directly using SCSI commands rather than going on the network [RFC8154]. This is equivalent to a loopback acceleration mechanism. If the client and server are on the same host, then there are natural ways to expose the devices to both peers, and the existing pNFS protocol and SCSI Persistent Reservation provide strong access authorization. Both the Linux NFS client and server implement RFC 8154 well enough that this could be an alternative or even a better solution than LOCALIO. The server stores an XFS file system on the devices, and hands out layouts with the device ID and LBAs of the extents where file content is located. The fly in this ointment is the need for NFSv3 support. In an earlier email Mike mentioned that Hammerspace isn't interested in providing a centrally managed directory of block devices that could be utilized by the MDS to simply inform the client of local devices. I don't think that's the only possible solution for discovering the locality of storage devices. -- Chuck Lever