Re: Supporting FALLOC_FL_WRITE_ZEROES in NFS4.2 with WRITE_SAME?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 18, 2025 at 2:15 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>
> On 3/18/25 5:03 PM, Rick Macklem wrote:
> > On Tue, Mar 18, 2025 at 9:01 AM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> >>
> >> On 3/18/25 11:09 AM, Anna Schumaker wrote:
> >>> Hi Takeshi,
> >>>
> >>> On 3/18/25 11:00 AM, Takeshi Nishimura wrote:
> >>>> Zhang Yi <yi.zhang@xxxxxxxxxx> wrote in linux-fsdevel@xxxxxxxxxxxxxxx:
> >>>>> Add support for FALLOC_FL_WRITE_ZEROES. This first allocates blocks as
> >>>>> unwritten, then issues a zero command outside of the running journal
> >>>>> handle, and finally converts them to a written state.
> >>>>
> >>>> Picking up where the NFS4.2 WRITE_SAME discussion stalled:
> >>>> FALLOC_FL_WRITE_ZEROES is coming, and IMO the only way to implement
> >>>> that for NFS is via WRITE_SAME.
> >>>>
> >>>> How to proceed?
> >>>
> >>> I've been working on patches for implementing FALLOC_FL_ZERO_RANGE support
> >>> in the NFS client using WRITE_SAME. I've also been experimenting with adding
> >>> an ioctl for the generic pattern writing part. I'm expecting to talk about
> >>> what I have for ioctl API at LSF next week, and I'll post an initial round
> >>> of patches shortly after.
> >>>
> >>> I do still need to think through any edge cases and write tests for
> >>> pynfs and fstests before anything can be merged.
> >>
> >> Takeshi, it would be immensely helpful to us if you could provide some
> >> detail about how exactly you intend to make use of WRITE_SAME so we can
> >> focus the development, review, and testing efforts.
> >>
> >> So far we don't have any specific use cases, but there is some
> >> skepticism (as voiced in the previous thread) about whether this
> >> facility will actually be useful.
> > Just fyi, there has been a similar discussion in FreeBSD land.
> > The main use case in FreeBSD land sounded like writing zeros for NVME,
> > if I followed it.
>
> We were informed that NVMe devices do not support write_same at all.
That is my understanding too. As I understand it, write_same is based on the
SCSI command that is only supported by a fairly small number of high end
drives.

NVME does not have write_same, but some (I don't know how common
it is?) have an optional command called Wr_Zero which writes zeros to a block.

Hopefully I've gotten the above about correct? rick

>
> But I'm more interested in why applications need to get the OS to write
> patterns. What kind of performance and scalability expectations are
> there? How big will the patterns be, how big will the target files be?
> What is the target improvement needed over an application repeatedly
> calling write(2) ?
>
> After staring at COPY offload for some time, I can imagine that there
> are some DoS footguns in here that NFS servers need to watch for. Can
> WRITE_SAME return "I wrote only 16MB, please call me again to do more"?
>
>
> > My impression is that the other pattern stuff isn't very useful, since only some
> > (a few?) SCSI devices know how to do it.
>
> Well that tells us that hardware offload is a ways off, unless
> someone has a specific device they want to support.
>
>
> > The problem I see is that WRITE_SAME isn't defined in a way where the
> > NFSv4 server can only implement zero'ng and fail the rest.
>
> Writing repeating patterns isn't difficult to fake for file systems
> or devices that don't have a native write_same facility. Trond suggested
> a way to do it in the previous thread.
>
>
> > As such. I am thinking that a new operation for NFSv4.2 that does writing
> > of zeros might be preferable to trying to (mis)use WROTE_SAME.
>
> I don't really understand the difference, from an application's point of
> view, between WRITE_SAME(zeroes) and DEALLOCATE. The storage behaves a
> little differently in these two cases, but what difference does it make
> to the app?
>
>
> > rick
> >
> >>
> >> For example, do you expect to have SCSI devices that accelerate
> >> WRITE_SAME? How will your applications use this? What kind (and size)
> >> of patterns do you expect to need?
> >>
> >>
> >> --
> >> Chuck Lever
> >>
>
>
> --
> Chuck Lever





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux