Re: Supporting FALLOC_FL_WRITE_ZEROES in NFS4.2 with WRITE_SAME?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/18/25 5:03 PM, Rick Macklem wrote:
> On Tue, Mar 18, 2025 at 9:01 AM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>>
>> On 3/18/25 11:09 AM, Anna Schumaker wrote:
>>> Hi Takeshi,
>>>
>>> On 3/18/25 11:00 AM, Takeshi Nishimura wrote:
>>>> Zhang Yi <yi.zhang@xxxxxxxxxx> wrote in linux-fsdevel@xxxxxxxxxxxxxxx:
>>>>> Add support for FALLOC_FL_WRITE_ZEROES. This first allocates blocks as
>>>>> unwritten, then issues a zero command outside of the running journal
>>>>> handle, and finally converts them to a written state.
>>>>
>>>> Picking up where the NFS4.2 WRITE_SAME discussion stalled:
>>>> FALLOC_FL_WRITE_ZEROES is coming, and IMO the only way to implement
>>>> that for NFS is via WRITE_SAME.
>>>>
>>>> How to proceed?
>>>
>>> I've been working on patches for implementing FALLOC_FL_ZERO_RANGE support
>>> in the NFS client using WRITE_SAME. I've also been experimenting with adding
>>> an ioctl for the generic pattern writing part. I'm expecting to talk about
>>> what I have for ioctl API at LSF next week, and I'll post an initial round
>>> of patches shortly after.
>>>
>>> I do still need to think through any edge cases and write tests for
>>> pynfs and fstests before anything can be merged.
>>
>> Takeshi, it would be immensely helpful to us if you could provide some
>> detail about how exactly you intend to make use of WRITE_SAME so we can
>> focus the development, review, and testing efforts.
>>
>> So far we don't have any specific use cases, but there is some
>> skepticism (as voiced in the previous thread) about whether this
>> facility will actually be useful.
> Just fyi, there has been a similar discussion in FreeBSD land.
> The main use case in FreeBSD land sounded like writing zeros for NVME,
> if I followed it.

We were informed that NVMe devices do not support write_same at all.

But I'm more interested in why applications need to get the OS to write
patterns. What kind of performance and scalability expectations are
there? How big will the patterns be, how big will the target files be?
What is the target improvement needed over an application repeatedly
calling write(2) ?

After staring at COPY offload for some time, I can imagine that there
are some DoS footguns in here that NFS servers need to watch for. Can
WRITE_SAME return "I wrote only 16MB, please call me again to do more"?


> My impression is that the other pattern stuff isn't very useful, since only some
> (a few?) SCSI devices know how to do it.

Well that tells us that hardware offload is a ways off, unless
someone has a specific device they want to support.


> The problem I see is that WRITE_SAME isn't defined in a way where the
> NFSv4 server can only implement zero'ng and fail the rest.

Writing repeating patterns isn't difficult to fake for file systems
or devices that don't have a native write_same facility. Trond suggested
a way to do it in the previous thread.


> As such. I am thinking that a new operation for NFSv4.2 that does writing
> of zeros might be preferable to trying to (mis)use WROTE_SAME.

I don't really understand the difference, from an application's point of
view, between WRITE_SAME(zeroes) and DEALLOCATE. The storage behaves a
little differently in these two cases, but what difference does it make
to the app?


> rick
> 
>>
>> For example, do you expect to have SCSI devices that accelerate
>> WRITE_SAME? How will your applications use this? What kind (and size)
>> of patterns do you expect to need?
>>
>>
>> --
>> Chuck Lever
>>


-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux