Re: [LSF/MM/BPF TOPIC] Implementing the NFS v4.2 WRITE_SAME operation: VFS or NFS ioctl() ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Please word wrap email text at 68-72 columns]

Anna, I think we need to consider how to integrate this
functionality across then entire storage stack, not just for NFS
client/server optimisation.  My comments are made with this in mind.

On Tue, Jan 14, 2025 at 04:38:03PM -0500, Anna Schumaker wrote:
> I've seen a few requests for implementing the NFS v4.2 WRITE_SAME
> [1] operation over the last few months [2][3] to accelerate
> writing patterns of data on the server, so it's been in the back
> of my mind for a future project. I'll need to write some code
> somewhere so NFS & NFSD can handle this request. I could keep any
> implementation internal to NFS / NFSD, but I'd like to find out if
> local filesystems would find this sort of feature useful and if I
> should put it in the VFS instead.

How closely does this match to the block device WRITE_SAME
(SCSI/NVMe) commands? I note there is a reference to this in the
RFC, but there are no details given.

i.e. is this NFS request something we can pass straight through to
the server side storage hardware if it supports hardware WRITE_SAME
commands, or do they have incompatible semantics?

If the two are compatible, then I think we really want server side
hardware offload to be possible. That requires the filesystem to
allocate/map the physical storage and then call into the block layer
to either offload it to the hardware or emulate it in software
(similar to how blkdev_issue_zeroout() works).

> I was thinking I could keep it simple, and model a function call
> based on write(3) / pwrite(3) to write some pattern N times
> starting at either the file's current offset or at a user-provide
> offset. Something like:
>
> write_pattern(int filedes, const void *pattern, size_t nbytes, size_t count);
> pwrite_pattern(int filedes, const void *pattern, size_t nbytes, size_t count, offset_t offset);

Apart from noting that pwritev2(RWF_ENCODED) would have been able to
support this, I'll let other people decide what the best
user/syscall API will be for this.

> I could then construct a WRITE_SAME call in the NFS client using
> this information. This seems "good enough" to me for what people
> have asked for, at least as a client-side interface. It wouldn't
> really help the server, which would still need to do several
> writes in a loop to be spec-compliant with writing the pattern to
> an offset inside the "application data block" [4] structure.

Right, so we need both NFS client side and server side local fs
support for the WRITE_SAME operation.

That implies we should implement it at the VFS as a file method.
i.e. ->write_same() at a similar layer to ->write_iter().

If we do that, then both the NFS client and the NFS server can use
the same VFS interface, and applications can use WRITE_SAME on both
NFS and local filesystems directly...

> But maybe I'm simplifying this too much, and others would find the
> additional application data block fields useful? Or should I keep
> it all inside NFS, and call it with an ioctl instead of putting it
> into the VFS?

I think a file method for VFS implementation is the right way to do
this because it allows both client side server offload and server
side hardware offload through the local filesystem. It also provides
a simple way to check if the filesystem supports the functionality
or not...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux