Re: Needed: ADB (WRITE_SAME) support in Linux nfsd

Chuck Lever <chuck.lever@xxxxxxxxxx> · Tue, 7 Jan 2025 11:55:55 -0500

On 1/7/25 10:36 AM, Takeshi Nishimura wrote:
On Tue, Jan 7, 2025 at 4:10 PM Anna Schumaker <anna.schumaker@xxxxxxxxxx> wrote:

Hi Takeshi,

On 1/6/25 6:56 PM, Takeshi Nishimura wrote:
Dear list,

how can we get ADB (WRITE_SAME) support in (Debian) Linux nfsd, and an
ioct() in Linux nfsd client to use it?

Thanks for the request! Just so you're aware of the process, this email list is for upstream Linux kernel development. If we decide to go ahead with adding WRITE_SAME support it'll be up to Debian later to enable it (that part is out of our hands, and isn't up to us).

I assume WRITE_SAME will not have a separate build flag, right?

We have a set of custom "big data" applications which could greatly
benefit from such an acceleration ABI, both for implementing "zero
data" (fill blocks with 0 bytes), and fill blocks with identical data
patterns, without sending the same pattern over and over again over
the network wire.

Having said that, I'm not opposed to implementing WRITE_SAME. I wonder if we could somehow use it to build support for fallocate's FALLOC_FL_ZERO_RANGE flag at the same time.

No, I am asking really for WRITE_SAME support to write identical data
to multiple locations. Like https://linux.die.net/man/8/sg_write_same
Writing zero bytes is just a subset, and not what we need. WRITE_SAME
is intended as "big data" and database accelerator function.

I'm also wondering if there would be any advantage to local filesystems if this were to be implemented as a generic system call, rather than as an NFS-specific ioctl(), since some storage devices have a WRITE_SAME operation that could be used for acceleration. But I haven't convinced myself either way yet.

Getting a new, generic syscall in Linux takes 3-5 years on average. By
then our project will be finished, or renewed with new funding, but
all without getting a boost from WRITE_SAME support in NFS-

For comparison:

Adding WRITE_SAME to the Linux NFS client and server implementation is
on the same order of time -- a year (or perhaps less), then getting it
into Debian stable will be more than 1 year, probably 2 or 3 (at a
guess).

A better approach would be for your team to implement what they need,
use it for your project (ie, custom build your kernels), then contribute
it to upstream so others can use it too. That would demonstrate there is
real user demand for this facility, and your code will have gained some
miles in production.

You could hire a consultant to implement it for you on a time frame that
is your choosing.

Upstream prioritizes economy of maintenance over code velocity; meaning,
how quickly a feature can be prototyped and productized is less
important to us than how much the feature will cost us to maintain in
the long run.

With my NFSD co-maintainer hat on: I would accept a WRITE_SAME
implementation, but it would have to come with tests -- pynfs and
xfstests are the usual test harnesses that can accommodate those.

In addition, NFSD is responsible only for the network protocol. The
local file system implementations have to handle the heavy lifting.
It's not clear to me what infrastructure is already available in Linux
file systems; that will take some research. (I think that is what
Anna was hinting at).

--
Chuck Lever