[Please word wrap email text at 68-72 columns] Anna, I think we need to consider how to integrate this functionality across then entire storage stack, not just for NFS client/server optimisation. My comments are made with this in mind. On Tue, Jan 14, 2025 at 04:38:03PM -0500, Anna Schumaker wrote: > I've seen a few requests for implementing the NFS v4.2 WRITE_SAME > [1] operation over the last few months [2][3] to accelerate > writing patterns of data on the server, so it's been in the back > of my mind for a future project. I'll need to write some code > somewhere so NFS & NFSD can handle this request. I could keep any > implementation internal to NFS / NFSD, but I'd like to find out if > local filesystems would find this sort of feature useful and if I > should put it in the VFS instead. How closely does this match to the block device WRITE_SAME (SCSI/NVMe) commands? I note there is a reference to this in the RFC, but there are no details given. i.e. is this NFS request something we can pass straight through to the server side storage hardware if it supports hardware WRITE_SAME commands, or do they have incompatible semantics? If the two are compatible, then I think we really want server side hardware offload to be possible. That requires the filesystem to allocate/map the physical storage and then call into the block layer to either offload it to the hardware or emulate it in software (similar to how blkdev_issue_zeroout() works). > I was thinking I could keep it simple, and model a function call > based on write(3) / pwrite(3) to write some pattern N times > starting at either the file's current offset or at a user-provide > offset. Something like: > > write_pattern(int filedes, const void *pattern, size_t nbytes, size_t count); > pwrite_pattern(int filedes, const void *pattern, size_t nbytes, size_t count, offset_t offset); Apart from noting that pwritev2(RWF_ENCODED) would have been able to support this, I'll let other people decide what the best user/syscall API will be for this. > I could then construct a WRITE_SAME call in the NFS client using > this information. This seems "good enough" to me for what people > have asked for, at least as a client-side interface. It wouldn't > really help the server, which would still need to do several > writes in a loop to be spec-compliant with writing the pattern to > an offset inside the "application data block" [4] structure. Right, so we need both NFS client side and server side local fs support for the WRITE_SAME operation. That implies we should implement it at the VFS as a file method. i.e. ->write_same() at a similar layer to ->write_iter(). If we do that, then both the NFS client and the NFS server can use the same VFS interface, and applications can use WRITE_SAME on both NFS and local filesystems directly... > But maybe I'm simplifying this too much, and others would find the > additional application data block fields useful? Or should I keep > it all inside NFS, and call it with an ioctl instead of putting it > into the VFS? I think a file method for VFS implementation is the right way to do this because it allows both client side server offload and server side hardware offload through the local filesystem. It also provides a simple way to check if the filesystem supports the functionality or not... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx