On Tue, 2025-01-07 at 11:55 -0500, Chuck Lever wrote: > On 1/7/25 10:36 AM, Takeshi Nishimura wrote: > > On Tue, Jan 7, 2025 at 4:10 PM Anna Schumaker > > <anna.schumaker@xxxxxxxxxx> wrote: > > > > > > Hi Takeshi, > > > > > > On 1/6/25 6:56 PM, Takeshi Nishimura wrote: > > > > Dear list, > > > > > > > > how can we get ADB (WRITE_SAME) support in (Debian) Linux nfsd, > > > > and an > > > > ioct() in Linux nfsd client to use it? > > > > > > Thanks for the request! Just so you're aware of the process, this > > > email list is for upstream Linux kernel development. If we decide > > > to go ahead with adding WRITE_SAME support it'll be up to Debian > > > later to enable it (that part is out of our hands, and isn't up > > > to us). > > > > I assume WRITE_SAME will not have a separate build flag, right? > > > > > > > > > > > > > We have a set of custom "big data" applications which could > > > > greatly > > > > benefit from such an acceleration ABI, both for implementing > > > > "zero > > > > data" (fill blocks with 0 bytes), and fill blocks with > > > > identical data > > > > patterns, without sending the same pattern over and over again > > > > over > > > > the network wire. > > > > > > Having said that, I'm not opposed to implementing WRITE_SAME. I > > > wonder if we could somehow use it to build support for > > > fallocate's FALLOC_FL_ZERO_RANGE flag at the same time. > > > > No, I am asking really for WRITE_SAME support to write identical > > data > > to multiple locations. Like > > https://linux.die.net/man/8/sg_write_same > > Writing zero bytes is just a subset, and not what we need. > > WRITE_SAME > > is intended as "big data" and database accelerator function. > > > > > > > > I'm also wondering if there would be any advantage to local > > > filesystems if this were to be implemented as a generic system > > > call, rather than as an NFS-specific ioctl(), since some storage > > > devices have a WRITE_SAME operation that could be used for > > > acceleration. But I haven't convinced myself either way yet. > > > > Getting a new, generic syscall in Linux takes 3-5 years on average. > > By > > then our project will be finished, or renewed with new funding, but > > all without getting a boost from WRITE_SAME support in NFS- > > For comparison: > > Adding WRITE_SAME to the Linux NFS client and server implementation > is > on the same order of time -- a year (or perhaps less), then getting > it > into Debian stable will be more than 1 year, probably 2 or 3 (at a > guess). > > A better approach would be for your team to implement what they need, > use it for your project (ie, custom build your kernels), then > contribute > it to upstream so others can use it too. That would demonstrate there > is > real user demand for this facility, and your code will have gained > some > miles in production. > > You could hire a consultant to implement it for you on a time frame > that > is your choosing. > > Upstream prioritizes economy of maintenance over code velocity; > meaning, > how quickly a feature can be prototyped and productized is less > important to us than how much the feature will cost us to maintain in > the long run. > > With my NFSD co-maintainer hat on: I would accept a WRITE_SAME > implementation, but it would have to come with tests -- pynfs and > xfstests are the usual test harnesses that can accommodate those. > > In addition, NFSD is responsible only for the network protocol. The > local file system implementations have to handle the heavy lifting. > It's not clear to me what infrastructure is already available in > Linux > file systems; that will take some research. (I think that is what > Anna was hinting at). > This functionality should be possible to implement using the clone_range ioctl() on the server or on the client for that matter. Yes, you'll have to use multiple clone_range calls, but you can use a geometric series to do it efficiently (i.e. write pattern, clone pattern, clone 2*pattern, clone 4*pattern, etc....). It's not hard to do, and the advantage is that it can work for all filesystems that implement clone_range. You'd not be limited to just using NFS with a special WRITE_SAME ioctl. Furthermore, doing it this way is space-efficent on most filesystems. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx