On Tue, Mar 18, 2025 at 2:15 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > On 3/18/25 5:03 PM, Rick Macklem wrote: > > On Tue, Mar 18, 2025 at 9:01 AM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > >> > >> On 3/18/25 11:09 AM, Anna Schumaker wrote: > >>> Hi Takeshi, > >>> > >>> On 3/18/25 11:00 AM, Takeshi Nishimura wrote: > >>>> Zhang Yi <yi.zhang@xxxxxxxxxx> wrote in linux-fsdevel@xxxxxxxxxxxxxxx: > >>>>> Add support for FALLOC_FL_WRITE_ZEROES. This first allocates blocks as > >>>>> unwritten, then issues a zero command outside of the running journal > >>>>> handle, and finally converts them to a written state. > >>>> > >>>> Picking up where the NFS4.2 WRITE_SAME discussion stalled: > >>>> FALLOC_FL_WRITE_ZEROES is coming, and IMO the only way to implement > >>>> that for NFS is via WRITE_SAME. > >>>> > >>>> How to proceed? > >>> > >>> I've been working on patches for implementing FALLOC_FL_ZERO_RANGE support > >>> in the NFS client using WRITE_SAME. I've also been experimenting with adding > >>> an ioctl for the generic pattern writing part. I'm expecting to talk about > >>> what I have for ioctl API at LSF next week, and I'll post an initial round > >>> of patches shortly after. > >>> > >>> I do still need to think through any edge cases and write tests for > >>> pynfs and fstests before anything can be merged. > >> > >> Takeshi, it would be immensely helpful to us if you could provide some > >> detail about how exactly you intend to make use of WRITE_SAME so we can > >> focus the development, review, and testing efforts. > >> > >> So far we don't have any specific use cases, but there is some > >> skepticism (as voiced in the previous thread) about whether this > >> facility will actually be useful. > > Just fyi, there has been a similar discussion in FreeBSD land. > > The main use case in FreeBSD land sounded like writing zeros for NVME, > > if I followed it. > > We were informed that NVMe devices do not support write_same at all. That is my understanding too. As I understand it, write_same is based on the SCSI command that is only supported by a fairly small number of high end drives. NVME does not have write_same, but some (I don't know how common it is?) have an optional command called Wr_Zero which writes zeros to a block. Hopefully I've gotten the above about correct? rick > > But I'm more interested in why applications need to get the OS to write > patterns. What kind of performance and scalability expectations are > there? How big will the patterns be, how big will the target files be? > What is the target improvement needed over an application repeatedly > calling write(2) ? > > After staring at COPY offload for some time, I can imagine that there > are some DoS footguns in here that NFS servers need to watch for. Can > WRITE_SAME return "I wrote only 16MB, please call me again to do more"? > > > > My impression is that the other pattern stuff isn't very useful, since only some > > (a few?) SCSI devices know how to do it. > > Well that tells us that hardware offload is a ways off, unless > someone has a specific device they want to support. > > > > The problem I see is that WRITE_SAME isn't defined in a way where the > > NFSv4 server can only implement zero'ng and fail the rest. > > Writing repeating patterns isn't difficult to fake for file systems > or devices that don't have a native write_same facility. Trond suggested > a way to do it in the previous thread. > > > > As such. I am thinking that a new operation for NFSv4.2 that does writing > > of zeros might be preferable to trying to (mis)use WROTE_SAME. > > I don't really understand the difference, from an application's point of > view, between WRITE_SAME(zeroes) and DEALLOCATE. The storage behaves a > little differently in these two cases, but what difference does it make > to the app? > > > > rick > > > >> > >> For example, do you expect to have SCSI devices that accelerate > >> WRITE_SAME? How will your applications use this? What kind (and size) > >> of patterns do you expect to need? > >> > >> > >> -- > >> Chuck Lever > >> > > > -- > Chuck Lever